date:20171011

[jira] [Commented] (ACCUMULO-4561) Crash when using ping on a non-existing server

2017-10-11 Thread Christopher Tubbs (JIRA)


[ 
https://issues.apache.org/jira/browse/ACCUMULO-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16201366#comment-16201366
 ] 

Christopher Tubbs commented on ACCUMULO-4561:
-

That's interesting. The thrift client should have a more sane failure behavior 
when the TCP response is not in the expected protocol. Jetty is detecting the 
protocol mismatch, and reacting reasonably using the only protocol it knows; 
there's no reason Thrift shouldn't behave similarly.

For a client, the response should be to throw an Exception if the protocol 
doesn't match. For a server, it should respond with an error message in its 
native protocol, like Jetty did with the HTTP 400 error.

This might be something we need to escalate to Thrift upstream developers, if 
it's not something already built in to the Thrift client that we're not 
handling properly. The next step should be to check to see if our client code 
is failing to properly handle a relevant exception coming from the Thrift 
library.

> Crash when using ping on a non-existing server
> --
>
> Key: ACCUMULO-4561
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4561
> Project: Accumulo
>  Issue Type: Bug
>  Components: shell
>Affects Versions: 2.0.0
>Reporter: Luis Tavarez
>
> While working on ACCUMULO-4558, I tried running 
> {code}ping -ts localhost:9995{code} (localhost:9995 does not have a a tserver 
> on my setup.)
> And it caused the shell to exit (crashed) and show the following message.
> {code}#
> # java.lang.OutOfMemoryError: Java heap space
> # -XX:OnOutOfMemoryError="kill -9 %p"
> #   Executing /bin/sh -c "kill -9 25561"...
> /home/lmtavar/git/uno/bin/uno: line 44: 25561 Killed  
> "$ACCUMULO_HOME"/bin/accumulo shell -u "$ACCUMULO_USER" -p 
> "$ACCUMULO_PASSWORD" "${@:2}"
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (ACCUMULO-4718) Accumulo-testing classes are broken

2017-10-11 Thread Michael Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/ACCUMULO-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16201026#comment-16201026
 ] 

Michael Miller commented on ACCUMULO-4718:
--

Well not quite... the accumulo-testing client appears to be configured properly 
but when it calls TableOperations().importDirectory on BulkPlusOne, it is out 
of sync.  The call to TableOperaions causes a FileNotFoundException when 
importDirectory() checks the path against the Hadoop FileSystem that it pulls 
from CachedConfiguration.getInstance().  The fs I have set in accumulo-testing 
client returns "hdfs://localhost:8020" while the FileSystem in TableOperations 
returns "file:///".
Here is the stacktrace:
{code}
2017-10-11 14:14:51,830 [randomwalk.bulk.BulkPlusOne] ERROR: 
org.apache.accumulo.core.client.AccumuloException: Bulk import  directory 
/tmp/bulk_f96ee0a1-e125-4b00-acf2-10a774cdfe57 does not exist!
org.apache.accumulo.core.client.AccumuloException: Bulk import  directory 
/tmp/bulk_f96ee0a1-e125-4b00-acf2-10a774cdfe57 does not exist!
at 
org.apache.accumulo.core.client.impl.TableOperationsImpl.checkPath(TableOperationsImpl.java:1101)
at 
org.apache.accumulo.core.client.impl.TableOperationsImpl.importDirectory(TableOperationsImpl.java:1123)
at 
org.apache.accumulo.testing.core.randomwalk.bulk.BulkPlusOne.bulkLoadLots(BulkPlusOne.java:102)
at 
org.apache.accumulo.testing.core.randomwalk.bulk.BulkPlusOne.runLater(BulkPlusOne.java:120)
at 
org.apache.accumulo.testing.core.randomwalk.bulk.BulkTest.lambda$visit$0(BulkTest.java:31)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at 
org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
at java.lang.Thread.run(Thread.java:748)
{code}

Meanwhile, if I connect to the Uno instance I have running, I can see that 
directory and its contents. 

> Accumulo-testing classes are broken
> ---
>
> Key: ACCUMULO-4718
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4718
> Project: Accumulo
>  Issue Type: Bug
>Reporter: Michael Miller
>Assignee: Michael Miller
> Fix For: 2.0.0
>
>
> Multiple changes to 2.0 over the past 6 months or so have been left out of 
> the accumulo-testing repo.  Update the testing classes with these changes so 
> they can be run against 2.0



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (ACCUMULO-4170) ClientConfiguration javadoc difficult to read

2017-10-11 Thread Mark Owens (JIRA)


 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Owens reassigned ACCUMULO-4170:


Assignee: Mark Owens

> ClientConfiguration javadoc difficult to read
> -
>
> Key: ACCUMULO-4170
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4170
> Project: Accumulo
>  Issue Type: Bug
>  Components: client, docs
>Affects Versions: 1.7.1
>Reporter: Mike Drob
>Assignee: Mark Owens
>Priority: Trivial
>  Labels: newbie
>
> The docs displayed on 
> https://accumulo.apache.org/1.7/apidocs/org/apache/accumulo/core/client/ClientConfiguration.html#loadDefault%28%29
>  are difficult to read because the list is displayed in-line. We could use 
> proper list formatting to improve readability.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] joshelser commented on issue #305: [ACCUMULO-4591] Add replication latency metrics

2017-10-11 Thread git

joshelser commented on issue #305: [ACCUMULO-4591] Add replication latency 
metrics
URL: https://github.com/apache/accumulo/pull/305#issuecomment-335938107
 
 
   No problem, dude. Thanks for the contribution!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] adamjshook commented on issue #305: [ACCUMULO-4591] Add replication latency metrics

2017-10-11 Thread git

adamjshook commented on issue #305: [ACCUMULO-4591] Add replication latency 
metrics
URL: https://github.com/apache/accumulo/pull/305#issuecomment-335930968
 
 
   @joshelser Thanks, if it wouldn't be too much of a hassle.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Commented] (ACCUMULO-4718) Accumulo-testing classes are broken

2017-10-11 Thread Michael Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/ACCUMULO-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16200722#comment-16200722
 ] 

Michael Miller commented on ACCUMULO-4718:
--

I solved the configuration issue... conf/accumulo-testing.properties requires 
proper setting of
# HDFS root path. Should match 'fs.defaultFS' property in Hadoop's core-site.xml
test.common.hdfs.root=hdfs://localhost:8020

It looks like this requirement could be eliminated with completion of 
ACCUMULO-4717

> Accumulo-testing classes are broken
> ---
>
> Key: ACCUMULO-4718
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4718
> Project: Accumulo
>  Issue Type: Bug
>Reporter: Michael Miller
>Assignee: Michael Miller
> Fix For: 2.0.0
>
>
> Multiple changes to 2.0 over the past 6 months or so have been left out of 
> the accumulo-testing repo.  Update the testing classes with these changes so 
> they can be run against 2.0



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] adamjshook commented on issue #305: [ACCUMULO-4591] Add replication latency metrics

2017-10-11 Thread git

adamjshook commented on issue #305: [ACCUMULO-4591] Add replication latency 
metrics
URL: https://github.com/apache/accumulo/pull/305#issuecomment-335895480
 
 
   @joshelser Back at you!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Comment Edited] (ACCUMULO-4561) Crash when using ping on a non-existing server

2017-10-11 Thread Mark Owens (JIRA)


[ 
https://issues.apache.org/jira/browse/ACCUMULO-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16200425#comment-16200425
 ] 

Mark Owens edited comment on ACCUMULO-4561 at 10/11/17 5:51 PM:


These crashes appear to be occurring when sending a ping request to ports that 
have Jetty listening.  I ran an nmap scan on my local machine looking for open 
ports and then ran the accumulo shell ping command against the open ports 
(closed ports return connection refused). 

Note that all these tests were run on the 2.0.0-SNAPSHOT.

My results are listed below:


{noformat}
TServer port on local instance:
9997/tcp  open  palace-6?
>>> localhost:9997:OK

Following ports all returned same response:
2181/tcp  open  eforward?
4560/tcp  open  unknown
5355/tcp  open  llmnr?
8030/tcp  open  hadoop-ipc  Hadoop IPC
8031/tcp  open  hadoop-ipc  Hadoop IPC
8032/tcp  open  hadoop-ipc  Hadoop IPC
8033/tcp  open  hadoop-ipc  Hadoop IPC
8040/tcp  open  hadoop-ipc  Hadoop IPC
9000/tcp  open  hadoop-ipc  Hadoop IPC
34737/tcp open  unknown
39473/tcp open  hadoop-ipc  Hadoop IPC
50010/tcp open  unknown
50020/tcp open  hadoop-ipc  Hadoop IPC
>>>  localhost:8031 ERROR org.apache.thrift.transport.TTransportException

9998/tcp  open  distinct32?
/tcp  open  abyss?
10001/tcp open  scp-config?
>>> localhost:9998 ERROR org.apache.thrift.TApplicationException: Invalid 
>>> method name: 'getTabletServerStatus'

13562/tcp open  unknown
>>> localhost:13562 ERROR org.apache.thrift.transport.TTransportException: 
>>> java.net.SocketTimeoutException: 12 millis timeout while waiting for 
>>> channel to be ready for read. ch : 
>>> java.nio.channels.SocketChannel[connected local=/127.0.0.1:36716 
>>> remote=localhost/127.0.0.1:13562]

Jetty ports:
8042/tcp  open  httpJetty 6.1.26
8088/tcp  open  httpJetty 6.1.26
9995/tcp  open  httpJetty 9.3.21.v20170918
44263/tcp open  httpJetty 6.1.26
50070/tcp open  httpJetty 6.1.26
50090/tcp open  httpJetty 6.1.26
>>> #
>>> # java.lang.OutOfMemoryError: Java heap space
>>> # -XX:OnOutOfMemoryError="kill -9 %p"
>>> #   Executing /bin/sh -c "kill -9 7693"...
>>> Killed

This port returned a different response after a timeout:
50075/tcp open  httpJetty 6.1.26
>>> localhost:50075 ERROR org.apache.thrift.transport.TTransportException: 
>>> java.net.SocketTimeoutException: 12 millis timeout while waiting for 
>>> channel to be ready for read. ch : 
>>> java.nio.channels.SocketChannel[connected local=/127.0.0.1:37190 
>>> remote=localhost/127.0.0.1:50075]
{noformat}


I have no feel for how often the 'ping -ts ' command is run and how often 
it would be provided an invalid port? I would assume a user would only supply a 
port if they suspected it to be a tserver. Given that case this situation would 
not happen very often, I suspect. 

I also noticed that if I stop the tablet servers after I'm in the shell and 
then run the ping command, the shell never returns the prompt to the user. 
Ctrl-C'ing at that points exits the shell as well.  I would think that should 
be fixed since the purpose of the ping is to retrieve the status of a tablet 
server. Has that behavior been documented and/or verified previously?





was (Author: jmark99):
These crashes appear to be occurring when sending a ping request to ports that 
have Jetty listening.  I ran an nmap scan on my local machine looking for open 
ports and then ran the accumulo shell ping command against the open ports 
(closed ports return connection refused). 

Note that all these tests were run on the 2.0.0-SNAPSHOT.

My results are listed below:

TServer port on local instance:
9997/tcp  open  palace-6?
>>> localhost:9997:OK

Following ports all returned same response:
2181/tcp  open  eforward?
4560/tcp  open  unknown
5355/tcp  open  llmnr?
8030/tcp  open  hadoop-ipc  Hadoop IPC
8031/tcp  open  hadoop-ipc  Hadoop IPC
8032/tcp  open  hadoop-ipc  Hadoop IPC
8033/tcp  open  hadoop-ipc  Hadoop IPC
8040/tcp  open  hadoop-ipc  Hadoop IPC
9000/tcp  open  hadoop-ipc  Hadoop IPC
34737/tcp open  unknown
39473/tcp open  hadoop-ipc  Hadoop IPC
50010/tcp open  unknown
50020/tcp open  hadoop-ipc  Hadoop IPC
>>>  localhost:8031 ERROR org.apache.thrift.transport.TTransportException

9998/tcp  open  distinct32?
/tcp  open  abyss?
10001/tcp open  scp-config?
>>> localhost:9998 ERROR org.apache.thrift.TApplicationException: Invalid 
>>> method name: 'getTabletServerStatus'

13562/tcp open  unknown
>>> localhost:13562 ERROR org.apache.thrift.transport.TTransportException: 
>>> java.net.SocketTimeoutException: 12 millis timeout while waiting for 
>>> channel to be ready for read. ch : 
>>> java.nio.channels.SocketChannel[connected local=/127.0.0.1:36716 
>>> remote=localhost/127.0.0.1:13562]

Jetty ports:
8042/tcp  open  httpJetty 6.1.26
8088/tcp  open  httpJetty 6.1.26
9995/tcp  open

[jira] [Commented] (ACCUMULO-4561) Crash when using ping on a non-existing server

2017-10-11 Thread Mark Owens (JIRA)


[ 
https://issues.apache.org/jira/browse/ACCUMULO-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16200641#comment-16200641
 ] 

Mark Owens commented on ACCUMULO-4561:
--

A little more info:  The monitor.log returns the following warning when the 
ping command targets a Jetty port:

{noformat}
2017-10-11 13:09:25,039 [http.HttpParser] WARN : Illegal character 0x0 in 
state=START for buffer 
HeapByteBuffer@3249e354[p=1,l=179,c=8192,r=178]={\x00<<<\x00\x00\xAf\x82!\x01\x15getTabletS...d92138d\x00,\x16\x00\x16\x00\x00\x00>>>\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00...\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00}
2017-10-11 13:09:25,039 [http.HttpParser] WARN : bad HTTP parsed: 400 Illegal 
character 0x0 for HttpChannelOverHttp@4676385e{r=0,c=false,a=IDLE,uri=null}
{noformat}


The returned HTTP payload (via wireshark) is:


{noformat}
   48 54 54 50 2f 31 2e 31 20 34 30 30 20 49 6c 6c  HTTP/1.1 400 Ill
0010   65 67 61 6c 20 63 68 61 72 61 63 74 65 72 20 30  egal character 0
0020   78 30 0d 0a 43 6f 6e 74 65 6e 74 2d 4c 65 6e 67  x0..Content-Leng
0030   74 68 3a 20 30 0d 0a 43 6f 6e 6e 65 63 74 69 6f  th: 0..Connectio
0040   6e 3a 20 63 6c 6f 73 65 0d 0a 53 65 72 76 65 72  n: close..Server
0050   3a 20 4a 65 74 74 79 28 39 2e 33 2e 32 31 2e 76  : Jetty(9.3.21.v
0060   32 30 31 37 30 39 31 38 29 0d 0a 0d 0a   20170918)
{noformat}


> Crash when using ping on a non-existing server
> --
>
> Key: ACCUMULO-4561
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4561
> Project: Accumulo
>  Issue Type: Bug
>  Components: shell
>Affects Versions: 2.0.0
>Reporter: Luis Tavarez
>
> While working on ACCUMULO-4558, I tried running 
> {code}ping -ts localhost:9995{code} (localhost:9995 does not have a a tserver 
> on my setup.)
> And it caused the shell to exit (crashed) and show the following message.
> {code}#
> # java.lang.OutOfMemoryError: Java heap space
> # -XX:OnOutOfMemoryError="kill -9 %p"
> #   Executing /bin/sh -c "kill -9 25561"...
> /home/lmtavar/git/uno/bin/uno: line 44: 25561 Killed  
> "$ACCUMULO_HOME"/bin/accumulo shell -u "$ACCUMULO_USER" -p 
> "$ACCUMULO_PASSWORD" "${@:2}"
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (ACCUMULO-4561) Crash when using ping on a non-existing server

2017-10-11 Thread Keith Turner (JIRA)


[ 
https://issues.apache.org/jira/browse/ACCUMULO-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16200561#comment-16200561
 ] 

Keith Turner commented on ACCUMULO-4561:


The ping command makes a thrift connection to the destination.  In this case 
the destination does not speak thrift and send something back, not sure what.  
I suspect the response from Jetty is read by the thrift code and used to try 
and allocate a byte array which exceeds available memory.

My memory is fuzzy on this, but I think on the server side in Accumulo we use 
the Thrift framed transport and limit the maximum byte array it will allocate.  
I think this is configurable.  I am not sure if this can be done on the thrift 
client side though.

> Crash when using ping on a non-existing server
> --
>
> Key: ACCUMULO-4561
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4561
> Project: Accumulo
>  Issue Type: Bug
>  Components: shell
>Affects Versions: 2.0.0
>Reporter: Luis Tavarez
>
> While working on ACCUMULO-4558, I tried running 
> {code}ping -ts localhost:9995{code} (localhost:9995 does not have a a tserver 
> on my setup.)
> And it caused the shell to exit (crashed) and show the following message.
> {code}#
> # java.lang.OutOfMemoryError: Java heap space
> # -XX:OnOutOfMemoryError="kill -9 %p"
> #   Executing /bin/sh -c "kill -9 25561"...
> /home/lmtavar/git/uno/bin/uno: line 44: 25561 Killed  
> "$ACCUMULO_HOME"/bin/accumulo shell -u "$ACCUMULO_USER" -p 
> "$ACCUMULO_PASSWORD" "${@:2}"
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] joshelser commented on a change in pull request #305: [ACCUMULO-4591] Add replication latency metrics

2017-10-11 Thread git

joshelser commented on a change in pull request #305: [ACCUMULO-4591] Add 
replication latency metrics
URL: https://github.com/apache/accumulo/pull/305#discussion_r144061407
 
 

 ##
 File path: 
server/master/src/main/java/org/apache/accumulo/master/metrics/Metrics2ReplicationMetrics.java
 ##
 @@ -124,4 +142,55 @@ protected int getNumConfiguredPeers() {
   protected int getMaxReplicationThreads() {
 return 
replicationUtil.getMaxReplicationThreads(master.getMasterMonitorInfo());
   }
+
+  protected void addReplicationQueueTimeMetrics() {
+// Exit early if replication table is offline
+if (TableState.ONLINE != Tables.getTableState(master.getInstance(), 
ReplicationTable.ID)) {
+  return;
+}
+
+// Exit early if we have no replication peers configured
+if (replicationUtil.getPeers().isEmpty()) {
+  return;
+}
+
+Set paths = replicationUtil.getPendingReplicationPaths();
+
+// We'll take a snap of the current time and use this as a diff between 
any deleted
+// file's modification time and now. The reported latency will be off by 
at most a
+// number of seconds equal to the metric polling period
+long currentTime = System.currentTimeMillis();
+
+// Iterate through all the pending paths and update the mod time if we 
don't know it yet
+for (Path path : paths) {
+  if (!pathModTimes.containsKey(path)) {
+try {
+  pathModTimes.put(path, 
master.getFileSystem().getFileStatus(path).getModificationTime());
+} catch (IOException e) {
+  // Ignore all IOExceptions
+  // Either the system is unavailable or the file was deleted
+  // since the initial scan and this check
+}
+  }
+}
+
+// Remove all currently pending files
+Set deletedPaths = new HashSet<>(pathModTimes.keySet());
+deletedPaths.removeAll(paths);
+
+// Exit early if we have no replicated files to report on
+if (deletedPaths.isEmpty()) {
+  return;
+}
+
+replicationQueueTimeStat.resetMinMax();
+
+for (Path path : deletedPaths) {
+  // Remove this path and add the latency
+  long modTime = pathModTimes.remove(path);
 
 Review comment:
   It's just the unboxing you really have to worry about. Something like the 
following would be fine:
   
   ```java
   Long modTime = pathModTimes.remove(path);
   if (modTime != null) {
 ...
   }
   ```
   
   Java will handle the unboxing for you once you guaranteed that `modTime` is 
definitely non-null.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] adamjshook commented on a change in pull request #305: [ACCUMULO-4591] Add replication latency metrics

2017-10-11 Thread git

adamjshook commented on a change in pull request #305: [ACCUMULO-4591] Add 
replication latency metrics
URL: https://github.com/apache/accumulo/pull/305#discussion_r144060426
 
 

 ##
 File path: 
server/master/src/main/java/org/apache/accumulo/master/metrics/Metrics2ReplicationMetrics.java
 ##
 @@ -124,4 +142,55 @@ protected int getNumConfiguredPeers() {
   protected int getMaxReplicationThreads() {
 return 
replicationUtil.getMaxReplicationThreads(master.getMasterMonitorInfo());
   }
+
+  protected void addReplicationQueueTimeMetrics() {
+// Exit early if replication table is offline
+if (TableState.ONLINE != Tables.getTableState(master.getInstance(), 
ReplicationTable.ID)) {
+  return;
+}
+
+// Exit early if we have no replication peers configured
+if (replicationUtil.getPeers().isEmpty()) {
+  return;
+}
+
+Set paths = replicationUtil.getPendingReplicationPaths();
+
+// We'll take a snap of the current time and use this as a diff between 
any deleted
+// file's modification time and now. The reported latency will be off by 
at most a
+// number of seconds equal to the metric polling period
+long currentTime = System.currentTimeMillis();
+
+// Iterate through all the pending paths and update the mod time if we 
don't know it yet
+for (Path path : paths) {
+  if (!pathModTimes.containsKey(path)) {
+try {
+  pathModTimes.put(path, 
master.getFileSystem().getFileStatus(path).getModificationTime());
+} catch (IOException e) {
+  // Ignore all IOExceptions
+  // Either the system is unavailable or the file was deleted
+  // since the initial scan and this check
+}
+  }
+}
+
+// Remove all currently pending files
+Set deletedPaths = new HashSet<>(pathModTimes.keySet());
+deletedPaths.removeAll(paths);
+
+// Exit early if we have no replicated files to report on
+if (deletedPaths.isEmpty()) {
+  return;
+}
+
+replicationQueueTimeStat.resetMinMax();
+
+for (Path path : deletedPaths) {
+  // Remove this path and add the latency
+  long modTime = pathModTimes.remove(path);
 
 Review comment:
   I was actually thinking how best to handle this.  It'd be an implementation 
error if the path to be removed wasn't in `pathModTimes`, since `deletedPaths` 
is a subset of `pathModTimes`.  Maybe check for existence and log an error if 
it doesn't exist?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] joshelser commented on a change in pull request #305: [ACCUMULO-4591] Add replication latency metrics

2017-10-11 Thread git

joshelser commented on a change in pull request #305: [ACCUMULO-4591] Add 
replication latency metrics
URL: https://github.com/apache/accumulo/pull/305#discussion_r144058257
 
 

 ##
 File path: 
server/master/src/main/java/org/apache/accumulo/master/metrics/Metrics2ReplicationMetrics.java
 ##
 @@ -124,4 +142,55 @@ protected int getNumConfiguredPeers() {
   protected int getMaxReplicationThreads() {
 return 
replicationUtil.getMaxReplicationThreads(master.getMasterMonitorInfo());
   }
+
+  protected void addReplicationQueueTimeMetrics() {
+// Exit early if replication table is offline
+if (TableState.ONLINE != Tables.getTableState(master.getInstance(), 
ReplicationTable.ID)) {
+  return;
+}
+
+// Exit early if we have no replication peers configured
+if (replicationUtil.getPeers().isEmpty()) {
+  return;
+}
+
+Set paths = replicationUtil.getPendingReplicationPaths();
+
+// We'll take a snap of the current time and use this as a diff between 
any deleted
+// file's modification time and now. The reported latency will be off by 
at most a
+// number of seconds equal to the metric polling period
+long currentTime = System.currentTimeMillis();
+
+// Iterate through all the pending paths and update the mod time if we 
don't know it yet
+for (Path path : paths) {
+  if (!pathModTimes.containsKey(path)) {
+try {
+  pathModTimes.put(path, 
master.getFileSystem().getFileStatus(path).getModificationTime());
+} catch (IOException e) {
+  // Ignore all IOExceptions
+  // Either the system is unavailable or the file was deleted
+  // since the initial scan and this check
+}
+  }
+}
+
+// Remove all currently pending files
+Set deletedPaths = new HashSet<>(pathModTimes.keySet());
+deletedPaths.removeAll(paths);
+
+// Exit early if we have no replicated files to report on
+if (deletedPaths.isEmpty()) {
+  return;
+}
+
+replicationQueueTimeStat.resetMinMax();
+
+for (Path path : deletedPaths) {
+  // Remove this path and add the latency
+  long modTime = pathModTimes.remove(path);
 
 Review comment:
   This will throw an error is `pathModTimes` ever does not have a mapping for 
`path` because you're unboxing the `Long` into a `long`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] joshelser commented on a change in pull request #305: [ACCUMULO-4591] Add replication latency metrics

2017-10-11 Thread git

joshelser commented on a change in pull request #305: [ACCUMULO-4591] Add 
replication latency metrics
URL: https://github.com/apache/accumulo/pull/305#discussion_r144058388
 
 

 ##
 File path: 
server/master/src/main/java/org/apache/accumulo/master/metrics/Metrics2ReplicationMetrics.java
 ##
 @@ -124,4 +142,55 @@ protected int getNumConfiguredPeers() {
   protected int getMaxReplicationThreads() {
 return 
replicationUtil.getMaxReplicationThreads(master.getMasterMonitorInfo());
   }
+
+  protected void addReplicationQueueTimeMetrics() {
 
 Review comment:
   Could you add a simple test case for this, please?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] joshelser commented on a change in pull request #305: [ACCUMULO-4591] Add replication latency metrics

2017-10-11 Thread git

joshelser commented on a change in pull request #305: [ACCUMULO-4591] Add 
replication latency metrics
URL: https://github.com/apache/accumulo/pull/305#discussion_r144058516
 
 

 ##
 File path: 
server/master/src/main/java/org/apache/accumulo/master/metrics/Metrics2ReplicationMetrics.java
 ##
 @@ -26,38 +29,51 @@
 import org.apache.accumulo.master.Master;
 import org.apache.accumulo.server.metrics.Metrics;
 import org.apache.accumulo.server.replication.ReplicationUtil;
+import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.metrics2.MetricsCollector;
 import org.apache.hadoop.metrics2.MetricsRecordBuilder;
 import org.apache.hadoop.metrics2.MetricsSource;
 import org.apache.hadoop.metrics2.MetricsSystem;
 import org.apache.hadoop.metrics2.lib.Interns;
 import org.apache.hadoop.metrics2.lib.MetricsRegistry;
+import org.apache.hadoop.metrics2.lib.MutableQuantiles;
+import org.apache.hadoop.metrics2.lib.MutableStat;
 
 /**
  *
  */
 public class Metrics2ReplicationMetrics implements Metrics, MetricsSource {
   public static final String NAME = MASTER_NAME + ",sub=Replication", 
DESCRIPTION = "Data-Center Replication Metrics", CONTEXT = "master",
   RECORD = "MasterReplication";
-  public static final String PENDING_FILES = "filesPendingReplication", 
NUM_PEERS = "numPeers", MAX_REPLICATION_THREADS = "maxReplicationThreads";
+  public static final String PENDING_FILES = "filesPendingReplication", 
NUM_PEERS = "numPeers", MAX_REPLICATION_THREADS = "maxReplicationThreads",
+  REPLICATION_QUEUE_TIME_QUANTILES = "replicationQueue10m", 
REPLICATION_QUEUE_TIME = "replicationQueue";
 
   private final Master master;
   private final MetricsSystem system;
   private final MetricsRegistry registry;
   private final ReplicationUtil replicationUtil;
+  private final MutableQuantiles replicationQueueTimeQuantiles;
+  private final MutableStat replicationQueueTimeStat;
+  private final Map pathModTimes;
 
   Metrics2ReplicationMetrics(Master master, MetricsSystem system) {
 this.master = master;
 this.system = system;
 
+pathModTimes = new HashMap<>();
+
 registry = new MetricsRegistry(Interns.info(NAME, DESCRIPTION));
 replicationUtil = new ReplicationUtil(master);
+replicationQueueTimeQuantiles = 
registry.newQuantiles(REPLICATION_QUEUE_TIME_QUANTILES, "replication queue time 
quantiles in milliseconds", "ops",
 
 Review comment:
   nit: capitalize "Replication" please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] joshelser commented on a change in pull request #305: [ACCUMULO-4591] Add replication latency metrics

2017-10-11 Thread git

joshelser commented on a change in pull request #305: [ACCUMULO-4591] Add 
replication latency metrics
URL: https://github.com/apache/accumulo/pull/305#discussion_r144057720
 
 

 ##
 File path: 
server/master/src/main/java/org/apache/accumulo/master/metrics/Metrics2ReplicationMetrics.java
 ##
 @@ -124,4 +142,55 @@ protected int getNumConfiguredPeers() {
   protected int getMaxReplicationThreads() {
 return 
replicationUtil.getMaxReplicationThreads(master.getMasterMonitorInfo());
   }
+
+  protected void addReplicationQueueTimeMetrics() {
+// Exit early if replication table is offline
+if (TableState.ONLINE != Tables.getTableState(master.getInstance(), 
ReplicationTable.ID)) {
+  return;
+}
+
+// Exit early if we have no replication peers configured
+if (replicationUtil.getPeers().isEmpty()) {
+  return;
+}
+
+Set paths = replicationUtil.getPendingReplicationPaths();
+
+// We'll take a snap of the current time and use this as a diff between 
any deleted
+// file's modification time and now. The reported latency will be off by 
at most a
+// number of seconds equal to the metric polling period
+long currentTime = System.currentTimeMillis();
+
+// Iterate through all the pending paths and update the mod time if we 
don't know it yet
+for (Path path : paths) {
+  if (!pathModTimes.containsKey(path)) {
+try {
+  pathModTimes.put(path, 
master.getFileSystem().getFileStatus(path).getModificationTime());
+} catch (IOException e) {
+  // Ignore all IOExceptions
+  // Either the system is unavailable or the file was deleted
+  // since the initial scan and this check
 
 Review comment:
   Add a `log.trace` in case you/someone ever runs into troubles and wants to 
try to debug this.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] joshelser commented on a change in pull request #305: [ACCUMULO-4591] Add replication latency metrics

2017-10-11 Thread git

joshelser commented on a change in pull request #305: [ACCUMULO-4591] Add 
replication latency metrics
URL: https://github.com/apache/accumulo/pull/305#discussion_r144058956
 
 

 ##
 File path: 
server/master/src/main/java/org/apache/accumulo/master/metrics/Metrics2ReplicationMetrics.java
 ##
 @@ -26,38 +29,51 @@
 import org.apache.accumulo.master.Master;
 import org.apache.accumulo.server.metrics.Metrics;
 import org.apache.accumulo.server.replication.ReplicationUtil;
+import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.metrics2.MetricsCollector;
 import org.apache.hadoop.metrics2.MetricsRecordBuilder;
 import org.apache.hadoop.metrics2.MetricsSource;
 import org.apache.hadoop.metrics2.MetricsSystem;
 import org.apache.hadoop.metrics2.lib.Interns;
 import org.apache.hadoop.metrics2.lib.MetricsRegistry;
+import org.apache.hadoop.metrics2.lib.MutableQuantiles;
+import org.apache.hadoop.metrics2.lib.MutableStat;
 
 /**
  *
  */
 public class Metrics2ReplicationMetrics implements Metrics, MetricsSource {
   public static final String NAME = MASTER_NAME + ",sub=Replication", 
DESCRIPTION = "Data-Center Replication Metrics", CONTEXT = "master",
   RECORD = "MasterReplication";
-  public static final String PENDING_FILES = "filesPendingReplication", 
NUM_PEERS = "numPeers", MAX_REPLICATION_THREADS = "maxReplicationThreads";
+  public static final String PENDING_FILES = "filesPendingReplication", 
NUM_PEERS = "numPeers", MAX_REPLICATION_THREADS = "maxReplicationThreads",
+  REPLICATION_QUEUE_TIME_QUANTILES = "replicationQueue10m", 
REPLICATION_QUEUE_TIME = "replicationQueue";
 
   private final Master master;
   private final MetricsSystem system;
   private final MetricsRegistry registry;
   private final ReplicationUtil replicationUtil;
+  private final MutableQuantiles replicationQueueTimeQuantiles;
+  private final MutableStat replicationQueueTimeStat;
+  private final Map pathModTimes;
 
   Metrics2ReplicationMetrics(Master master, MetricsSystem system) {
 this.master = master;
 this.system = system;
 
+pathModTimes = new HashMap<>();
+
 registry = new MetricsRegistry(Interns.info(NAME, DESCRIPTION));
 replicationUtil = new ReplicationUtil(master);
+replicationQueueTimeQuantiles = 
registry.newQuantiles(REPLICATION_QUEUE_TIME_QUANTILES, "replication queue time 
quantiles in milliseconds", "ops",
+"latency", 600);
+replicationQueueTimeStat = registry.newStat(REPLICATION_QUEUE_TIME, 
"replication queue time stat in milliseconds", "ops", "latency", true);
 
 Review comment:
   nit: "statistics" instead of "stat" (and capitalize Replication too)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Commented] (ACCUMULO-4591) Replication Latency Metrics2

2017-10-11 Thread Adam J Shook (JIRA)


[ 
https://issues.apache.org/jira/browse/ACCUMULO-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16200503#comment-16200503
 ] 

Adam J Shook commented on ACCUMULO-4591:


[~elserj], would you have time in the near future to take a look at the new PR?

> Replication Latency Metrics2
> 
>
> Key: ACCUMULO-4591
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4591
> Project: Accumulo
>  Issue Type: Improvement
>Reporter: Noe
>Assignee: Adam J Shook
>  Labels: pull-request-available
> Fix For: 2.0.0
>
> Attachments: Screen Shot 2017-02-23 at 9.31.06 AM.png
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Currently Files Pending Replication is the only available insight to the 
> state of replication. Latency of replication has been a great concern. 
> Without a latency metric users can not determine what configuration settings 
> either reduce or increase replication latency.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (ACCUMULO-4561) Crash when using ping on a non-existing server

2017-10-11 Thread Mark Owens (JIRA)


[ 
https://issues.apache.org/jira/browse/ACCUMULO-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16200425#comment-16200425
 ] 

Mark Owens edited comment on ACCUMULO-4561 at 10/11/17 3:10 PM:


These crashes appear to be occurring when sending a ping request to ports that 
have Jetty listening.  I ran an nmap scan on my local machine looking for open 
ports and then ran the accumulo shell ping command against the open ports 
(closed ports return connection refused). 

Note that all these tests were run on the 2.0.0-SNAPSHOT.

My results are listed below:

{{TServer port on local instance:
9997/tcp  open  palace-6?
>>> localhost:9997:OK

Following ports all returned same response:
2181/tcp  open  eforward?
4560/tcp  open  unknown
5355/tcp  open  llmnr?
8030/tcp  open  hadoop-ipc  Hadoop IPC
8031/tcp  open  hadoop-ipc  Hadoop IPC
8032/tcp  open  hadoop-ipc  Hadoop IPC
8033/tcp  open  hadoop-ipc  Hadoop IPC
8040/tcp  open  hadoop-ipc  Hadoop IPC
9000/tcp  open  hadoop-ipc  Hadoop IPC
34737/tcp open  unknown
39473/tcp open  hadoop-ipc  Hadoop IPC
50010/tcp open  unknown
50020/tcp open  hadoop-ipc  Hadoop IPC
>>>  localhost:8031 ERROR org.apache.thrift.transport.TTransportException

9998/tcp  open  distinct32?
/tcp  open  abyss?
10001/tcp open  scp-config?
>>> localhost:9998 ERROR org.apache.thrift.TApplicationException: Invalid 
>>> method name: 'getTabletServerStatus'

13562/tcp open  unknown
>>> localhost:13562 ERROR org.apache.thrift.transport.TTransportException: 
>>> java.net.SocketTimeoutException: 12 millis timeout while waiting for 
>>> channel to be ready for read. ch : 
>>> java.nio.channels.SocketChannel[connected local=/127.0.0.1:36716 
>>> remote=localhost/127.0.0.1:13562]

Jetty ports:
8042/tcp  open  httpJetty 6.1.26
8088/tcp  open  httpJetty 6.1.26
9995/tcp  open  httpJetty 9.3.21.v20170918
44263/tcp open  httpJetty 6.1.26
50070/tcp open  httpJetty 6.1.26
50090/tcp open  httpJetty 6.1.26
>>> #
>>> # java.lang.OutOfMemoryError: Java heap space
>>> # -XX:OnOutOfMemoryError="kill -9 %p"
>>> #   Executing /bin/sh -c "kill -9 7693"...
>>> Killed

This port returned a different response after a timeout:
50075/tcp open  httpJetty 6.1.26
>>> localhost:50075 ERROR org.apache.thrift.transport.TTransportException: 
>>> java.net.SocketTimeoutException: 12 millis timeout while waiting for 
>>> channel to be ready for read. ch : 
>>> java.nio.channels.SocketChannel[connected local=/127.0.0.1:37190 
>>> remote=localhost/127.0.0.1:50075]}}

I have no feel for how often the 'ping -ts ' command is run and how often 
it would be provided an invalid port? I would assume a user would only supply a 
port if they suspected it to be a tserver. Given that case this situation would 
not happen very often, I suspect. 

I also noticed that if I stop the tablet servers after I'm in the shell and 
then run the ping command, the shell never returns the prompt to the user. 
Ctrl-C'ing at that points exits the shell as well.  I would think that should 
be fixed since the purpose of the ping is to retrieve the status of a tablet 
server. Has that behavior been documented and/or verified previously?





was (Author: jmark99):
These crashes appear to be occurring when sending a ping request to ports that 
have Jetty listening.  I ran an nmap scan on my local machine looking for open 
ports and then ran the accumulo shell ping command against the open ports 
(closed ports return connection refused). 

Note that all these tests were run on the 2.0.0-SNAPSHOT.

My results are listed below:

{{TServer port on local instance:
9997/tcp  open  palace-6?
>>> localhost:9997:OK

Following ports all returned same response:
2181/tcp  open  eforward?
4560/tcp  open  unknown
5355/tcp  open  llmnr?
8030/tcp  open  hadoop-ipc  Hadoop IPC
8031/tcp  open  hadoop-ipc  Hadoop IPC
8032/tcp  open  hadoop-ipc  Hadoop IPC
8033/tcp  open  hadoop-ipc  Hadoop IPC
8040/tcp  open  hadoop-ipc  Hadoop IPC
9000/tcp  open  hadoop-ipc  Hadoop IPC
34737/tcp open  unknown
39473/tcp open  hadoop-ipc  Hadoop IPC
50010/tcp open  unknown
50020/tcp open  hadoop-ipc  Hadoop IPC
>>>  localhost:8031 ERROR org.apache.thrift.transport.TTransportException

9998/tcp  open  distinct32?
/tcp  open  abyss?
10001/tcp open  scp-config?
>>> localhost:9998 ERROR org.apache.thrift.TApplicationException: Invalid 
>>> method name: 'getTabletServerStatus'

13562/tcp open  unknown
>>> localhost:13562 ERROR org.apache.thrift.transport.TTransportException: 
>>> java.net.SocketTimeoutException: 12 millis timeout while waiting for 
>>> channel to be ready for read. ch : 
>>> java.nio.channels.SocketChannel[connected local=/127.0.0.1:36716 
>>> remote=localhost/127.0.0.1:13562]

Jetty ports:
8042/tcp  open  httpJetty 6.1.26
8088/tcp  open  httpJetty 6.1.26
9995/tcp  open  httpJetty

[jira] [Comment Edited] (ACCUMULO-4561) Crash when using ping on a non-existing server

2017-10-11 Thread Mark Owens (JIRA)


[ 
https://issues.apache.org/jira/browse/ACCUMULO-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16200425#comment-16200425
 ] 

Mark Owens edited comment on ACCUMULO-4561 at 10/11/17 3:10 PM:


These crashes appear to be occurring when sending a ping request to ports that 
have Jetty listening.  I ran an nmap scan on my local machine looking for open 
ports and then ran the accumulo shell ping command against the open ports 
(closed ports return connection refused). 

Note that all these tests were run on the 2.0.0-SNAPSHOT.

My results are listed below:

TServer port on local instance:
9997/tcp  open  palace-6?
>>> localhost:9997:OK

Following ports all returned same response:
2181/tcp  open  eforward?
4560/tcp  open  unknown
5355/tcp  open  llmnr?
8030/tcp  open  hadoop-ipc  Hadoop IPC
8031/tcp  open  hadoop-ipc  Hadoop IPC
8032/tcp  open  hadoop-ipc  Hadoop IPC
8033/tcp  open  hadoop-ipc  Hadoop IPC
8040/tcp  open  hadoop-ipc  Hadoop IPC
9000/tcp  open  hadoop-ipc  Hadoop IPC
34737/tcp open  unknown
39473/tcp open  hadoop-ipc  Hadoop IPC
50010/tcp open  unknown
50020/tcp open  hadoop-ipc  Hadoop IPC
>>>  localhost:8031 ERROR org.apache.thrift.transport.TTransportException

9998/tcp  open  distinct32?
/tcp  open  abyss?
10001/tcp open  scp-config?
>>> localhost:9998 ERROR org.apache.thrift.TApplicationException: Invalid 
>>> method name: 'getTabletServerStatus'

13562/tcp open  unknown
>>> localhost:13562 ERROR org.apache.thrift.transport.TTransportException: 
>>> java.net.SocketTimeoutException: 12 millis timeout while waiting for 
>>> channel to be ready for read. ch : 
>>> java.nio.channels.SocketChannel[connected local=/127.0.0.1:36716 
>>> remote=localhost/127.0.0.1:13562]

Jetty ports:
8042/tcp  open  httpJetty 6.1.26
8088/tcp  open  httpJetty 6.1.26
9995/tcp  open  httpJetty 9.3.21.v20170918
44263/tcp open  httpJetty 6.1.26
50070/tcp open  httpJetty 6.1.26
50090/tcp open  httpJetty 6.1.26
>>> #
>>> # java.lang.OutOfMemoryError: Java heap space
>>> # -XX:OnOutOfMemoryError="kill -9 %p"
>>> #   Executing /bin/sh -c "kill -9 7693"...
>>> Killed

This port returned a different response after a timeout:
50075/tcp open  httpJetty 6.1.26
>>> localhost:50075 ERROR org.apache.thrift.transport.TTransportException: 
>>> java.net.SocketTimeoutException: 12 millis timeout while waiting for 
>>> channel to be ready for read. ch : 
>>> java.nio.channels.SocketChannel[connected local=/127.0.0.1:37190 
>>> remote=localhost/127.0.0.1:50075]

I have no feel for how often the 'ping -ts ' command is run and how often 
it would be provided an invalid port? I would assume a user would only supply a 
port if they suspected it to be a tserver. Given that case this situation would 
not happen very often, I suspect. 

I also noticed that if I stop the tablet servers after I'm in the shell and 
then run the ping command, the shell never returns the prompt to the user. 
Ctrl-C'ing at that points exits the shell as well.  I would think that should 
be fixed since the purpose of the ping is to retrieve the status of a tablet 
server. Has that behavior been documented and/or verified previously?





was (Author: jmark99):
These crashes appear to be occurring when sending a ping request to ports that 
have Jetty listening.  I ran an nmap scan on my local machine looking for open 
ports and then ran the accumulo shell ping command against the open ports 
(closed ports return connection refused). 

Note that all these tests were run on the 2.0.0-SNAPSHOT.

My results are listed below:

{{TServer port on local instance:
9997/tcp  open  palace-6?
>>> localhost:9997:OK

Following ports all returned same response:
2181/tcp  open  eforward?
4560/tcp  open  unknown
5355/tcp  open  llmnr?
8030/tcp  open  hadoop-ipc  Hadoop IPC
8031/tcp  open  hadoop-ipc  Hadoop IPC
8032/tcp  open  hadoop-ipc  Hadoop IPC
8033/tcp  open  hadoop-ipc  Hadoop IPC
8040/tcp  open  hadoop-ipc  Hadoop IPC
9000/tcp  open  hadoop-ipc  Hadoop IPC
34737/tcp open  unknown
39473/tcp open  hadoop-ipc  Hadoop IPC
50010/tcp open  unknown
50020/tcp open  hadoop-ipc  Hadoop IPC
>>>  localhost:8031 ERROR org.apache.thrift.transport.TTransportException

9998/tcp  open  distinct32?
/tcp  open  abyss?
10001/tcp open  scp-config?
>>> localhost:9998 ERROR org.apache.thrift.TApplicationException: Invalid 
>>> method name: 'getTabletServerStatus'

13562/tcp open  unknown
>>> localhost:13562 ERROR org.apache.thrift.transport.TTransportException: 
>>> java.net.SocketTimeoutException: 12 millis timeout while waiting for 
>>> channel to be ready for read. ch : 
>>> java.nio.channels.SocketChannel[connected local=/127.0.0.1:36716 
>>> remote=localhost/127.0.0.1:13562]

Jetty ports:
8042/tcp  open  httpJetty 6.1.26
8088/tcp  open  httpJetty 6.1.26
9995/tcp  open  httpJetty 9.3.

[jira] [Commented] (ACCUMULO-4561) Crash when using ping on a non-existing server

2017-10-11 Thread Mark Owens (JIRA)


[ 
https://issues.apache.org/jira/browse/ACCUMULO-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16200425#comment-16200425
 ] 

Mark Owens commented on ACCUMULO-4561:
--

These crashes appear to be occurring when sending a ping request to ports that 
have Jetty listening.  I ran an nmap scan on my local machine looking for open 
ports and then ran the accumulo shell ping command against the open ports 
(closed ports return connection refused). 

Note that all these tests were run on the 2.0.0-SNAPSHOT.

My results are listed below:

{{
TServer port on local instance:
9997/tcp  open  palace-6?
>>> localhost:9997:OK

Following ports all returned same response:
2181/tcp  open  eforward?
4560/tcp  open  unknown
5355/tcp  open  llmnr?
8030/tcp  open  hadoop-ipc  Hadoop IPC
8031/tcp  open  hadoop-ipc  Hadoop IPC
8032/tcp  open  hadoop-ipc  Hadoop IPC
8033/tcp  open  hadoop-ipc  Hadoop IPC
8040/tcp  open  hadoop-ipc  Hadoop IPC
9000/tcp  open  hadoop-ipc  Hadoop IPC
34737/tcp open  unknown
39473/tcp open  hadoop-ipc  Hadoop IPC
50010/tcp open  unknown
50020/tcp open  hadoop-ipc  Hadoop IPC
>>>  localhost:8031 ERROR org.apache.thrift.transport.TTransportException

9998/tcp  open  distinct32?
/tcp  open  abyss?
10001/tcp open  scp-config?
>>> localhost:9998 ERROR org.apache.thrift.TApplicationException: Invalid 
>>> method name: 'getTabletServerStatus'

13562/tcp open  unknown
>>> localhost:13562 ERROR org.apache.thrift.transport.TTransportException: 
>>> java.net.SocketTimeoutException: 12 millis timeout while waiting for 
>>> channel to be ready for read. ch : 
>>> java.nio.channels.SocketChannel[connected local=/127.0.0.1:36716 
>>> remote=localhost/127.0.0.1:13562]

Jetty ports:
8042/tcp  open  httpJetty 6.1.26
8088/tcp  open  httpJetty 6.1.26
9995/tcp  open  httpJetty 9.3.21.v20170918
44263/tcp open  httpJetty 6.1.26
50070/tcp open  httpJetty 6.1.26
50090/tcp open  httpJetty 6.1.26
>>> #
>>> # java.lang.OutOfMemoryError: Java heap space
>>> # -XX:OnOutOfMemoryError="kill -9 %p"
>>> #   Executing /bin/sh -c "kill -9 7693"...
>>> Killed

This port returned a different response after a timeout:
50075/tcp open  httpJetty 6.1.26
>>> localhost:50075 ERROR org.apache.thrift.transport.TTransportException: 
>>> java.net.SocketTimeoutException: 12 millis timeout while waiting for 
>>> channel to be ready for read. ch : 
>>> java.nio.channels.SocketChannel[connected local=/127.0.0.1:37190 
>>> remote=localhost/127.0.0.1:50075]
}}

I have no feel for how often the 'ping -ts ' command is run and how often 
it would be provided an invalid port? I would assume a user would only supply a 
port if they suspected it to be a tserver. Given that case this situation would 
not happen very often, I suspect. 

I also noticed that if I stop the tablet servers after I'm in the shell and 
then run the ping command, the shell never returns the prompt to the user. 
Ctrl-C'ing at that points exits the shell as well.  I would think that should 
be fixed since the purpose of the ping is to retrieve the status of a tablet 
server. Has that behavior been documented and/or verified previously?




> Crash when using ping on a non-existing server
> --
>
> Key: ACCUMULO-4561
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4561
> Project: Accumulo
>  Issue Type: Bug
>  Components: shell
>Affects Versions: 2.0.0
>Reporter: Luis Tavarez
>
> While working on ACCUMULO-4558, I tried running 
> {code}ping -ts localhost:9995{code} (localhost:9995 does not have a a tserver 
> on my setup.)
> And it caused the shell to exit (crashed) and show the following message.
> {code}#
> # java.lang.OutOfMemoryError: Java heap space
> # -XX:OnOutOfMemoryError="kill -9 %p"
> #   Executing /bin/sh -c "kill -9 25561"...
> /home/lmtavar/git/uno/bin/uno: line 44: 25561 Killed  
> "$ACCUMULO_HOME"/bin/accumulo shell -u "$ACCUMULO_USER" -p 
> "$ACCUMULO_PASSWORD" "${@:2}"
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (ACCUMULO-4561) Crash when using ping on a non-existing server

2017-10-11 Thread Mark Owens (JIRA)


[ 
https://issues.apache.org/jira/browse/ACCUMULO-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16200425#comment-16200425
 ] 

Mark Owens edited comment on ACCUMULO-4561 at 10/11/17 3:09 PM:


These crashes appear to be occurring when sending a ping request to ports that 
have Jetty listening.  I ran an nmap scan on my local machine looking for open 
ports and then ran the accumulo shell ping command against the open ports 
(closed ports return connection refused). 

Note that all these tests were run on the 2.0.0-SNAPSHOT.

My results are listed below:

{{TServer port on local instance:
9997/tcp  open  palace-6?
>>> localhost:9997:OK

Following ports all returned same response:
2181/tcp  open  eforward?
4560/tcp  open  unknown
5355/tcp  open  llmnr?
8030/tcp  open  hadoop-ipc  Hadoop IPC
8031/tcp  open  hadoop-ipc  Hadoop IPC
8032/tcp  open  hadoop-ipc  Hadoop IPC
8033/tcp  open  hadoop-ipc  Hadoop IPC
8040/tcp  open  hadoop-ipc  Hadoop IPC
9000/tcp  open  hadoop-ipc  Hadoop IPC
34737/tcp open  unknown
39473/tcp open  hadoop-ipc  Hadoop IPC
50010/tcp open  unknown
50020/tcp open  hadoop-ipc  Hadoop IPC
>>>  localhost:8031 ERROR org.apache.thrift.transport.TTransportException

9998/tcp  open  distinct32?
/tcp  open  abyss?
10001/tcp open  scp-config?
>>> localhost:9998 ERROR org.apache.thrift.TApplicationException: Invalid 
>>> method name: 'getTabletServerStatus'

13562/tcp open  unknown
>>> localhost:13562 ERROR org.apache.thrift.transport.TTransportException: 
>>> java.net.SocketTimeoutException: 12 millis timeout while waiting for 
>>> channel to be ready for read. ch : 
>>> java.nio.channels.SocketChannel[connected local=/127.0.0.1:36716 
>>> remote=localhost/127.0.0.1:13562]

Jetty ports:
8042/tcp  open  httpJetty 6.1.26
8088/tcp  open  httpJetty 6.1.26
9995/tcp  open  httpJetty 9.3.21.v20170918
44263/tcp open  httpJetty 6.1.26
50070/tcp open  httpJetty 6.1.26
50090/tcp open  httpJetty 6.1.26
>>> #
>>> # java.lang.OutOfMemoryError: Java heap space
>>> # -XX:OnOutOfMemoryError="kill -9 %p"
>>> #   Executing /bin/sh -c "kill -9 7693"...
>>> Killed

This port returned a different response after a timeout:
50075/tcp open  httpJetty 6.1.26
>>> localhost:50075 ERROR org.apache.thrift.transport.TTransportException: 
>>> java.net.SocketTimeoutException: 12 millis timeout while waiting for 
>>> channel to be ready for read. ch : 
>>> java.nio.channels.SocketChannel[connected local=/127.0.0.1:37190 
>>> remote=localhost/127.0.0.1:50075]}}

I have no feel for how often the 'ping -ts ' command is run and how often 
it would be provided an invalid port? I would assume a user would only supply a 
port if they suspected it to be a tserver. Given that case this situation would 
not happen very often, I suspect. 

I also noticed that if I stop the tablet servers after I'm in the shell and 
then run the ping command, the shell never returns the prompt to the user. 
Ctrl-C'ing at that points exits the shell as well.  I would think that should 
be fixed since the purpose of the ping is to retrieve the status of a tablet 
server. Has that behavior been documented and/or verified previously?





was (Author: jmark99):
These crashes appear to be occurring when sending a ping request to ports that 
have Jetty listening.  I ran an nmap scan on my local machine looking for open 
ports and then ran the accumulo shell ping command against the open ports 
(closed ports return connection refused). 

Note that all these tests were run on the 2.0.0-SNAPSHOT.

My results are listed below:

{{
TServer port on local instance:
9997/tcp  open  palace-6?
>>> localhost:9997:OK

Following ports all returned same response:
2181/tcp  open  eforward?
4560/tcp  open  unknown
5355/tcp  open  llmnr?
8030/tcp  open  hadoop-ipc  Hadoop IPC
8031/tcp  open  hadoop-ipc  Hadoop IPC
8032/tcp  open  hadoop-ipc  Hadoop IPC
8033/tcp  open  hadoop-ipc  Hadoop IPC
8040/tcp  open  hadoop-ipc  Hadoop IPC
9000/tcp  open  hadoop-ipc  Hadoop IPC
34737/tcp open  unknown
39473/tcp open  hadoop-ipc  Hadoop IPC
50010/tcp open  unknown
50020/tcp open  hadoop-ipc  Hadoop IPC
>>>  localhost:8031 ERROR org.apache.thrift.transport.TTransportException

9998/tcp  open  distinct32?
/tcp  open  abyss?
10001/tcp open  scp-config?
>>> localhost:9998 ERROR org.apache.thrift.TApplicationException: Invalid 
>>> method name: 'getTabletServerStatus'

13562/tcp open  unknown
>>> localhost:13562 ERROR org.apache.thrift.transport.TTransportException: 
>>> java.net.SocketTimeoutException: 12 millis timeout while waiting for 
>>> channel to be ready for read. ch : 
>>> java.nio.channels.SocketChannel[connected local=/127.0.0.1:36716 
>>> remote=localhost/127.0.0.1:13562]

Jetty ports:
8042/tcp  open  httpJetty 6.1.26
8088/tcp  open  httpJetty 6.1.26
9995/tcp  open  httpJetty

[jira] [Created] (ACCUMULO-4721) Document rfile-info in the user manual

2017-10-11 Thread Michael Wall (JIRA)

Michael Wall created ACCUMULO-4721:
--

 Summary: Document rfile-info in the user manual
 Key: ACCUMULO-4721
 URL: https://issues.apache.org/jira/browse/ACCUMULO-4721
 Project: Accumulo
  Issue Type: Bug
Affects Versions: 1.8.1, 1.7.3, 2.0.0
Reporter: Michael Wall
Priority: Trivial


Currently the 'old school' PrintInfo is documented at 
http://accumulo.apache.org/1.8/accumulo_user_manual.html#_tools

We should document the 'rfile-info' info which is easier to remember than 
org.apache.accumulo.core.file.rfile.PrintInfo



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (ACCUMULO-4561) Crash when using ping on a non-existing server

[jira] [Commented] (ACCUMULO-4718) Accumulo-testing classes are broken

[jira] [Assigned] (ACCUMULO-4170) ClientConfiguration javadoc difficult to read

[GitHub] joshelser commented on issue #305: [ACCUMULO-4591] Add replication latency metrics

[GitHub] adamjshook commented on issue #305: [ACCUMULO-4591] Add replication latency metrics

[jira] [Commented] (ACCUMULO-4718) Accumulo-testing classes are broken

[GitHub] adamjshook commented on issue #305: [ACCUMULO-4591] Add replication latency metrics

[jira] [Comment Edited] (ACCUMULO-4561) Crash when using ping on a non-existing server

[jira] [Commented] (ACCUMULO-4561) Crash when using ping on a non-existing server

[jira] [Commented] (ACCUMULO-4561) Crash when using ping on a non-existing server

[GitHub] joshelser commented on a change in pull request #305: [ACCUMULO-4591] Add replication latency metrics

[GitHub] adamjshook commented on a change in pull request #305: [ACCUMULO-4591] Add replication latency metrics

[GitHub] joshelser commented on a change in pull request #305: [ACCUMULO-4591] Add replication latency metrics

[GitHub] joshelser commented on a change in pull request #305: [ACCUMULO-4591] Add replication latency metrics

[GitHub] joshelser commented on a change in pull request #305: [ACCUMULO-4591] Add replication latency metrics

[GitHub] joshelser commented on a change in pull request #305: [ACCUMULO-4591] Add replication latency metrics

[GitHub] joshelser commented on a change in pull request #305: [ACCUMULO-4591] Add replication latency metrics

[jira] [Commented] (ACCUMULO-4591) Replication Latency Metrics2

[jira] [Comment Edited] (ACCUMULO-4561) Crash when using ping on a non-existing server

[jira] [Comment Edited] (ACCUMULO-4561) Crash when using ping on a non-existing server

[jira] [Commented] (ACCUMULO-4561) Crash when using ping on a non-existing server

[jira] [Comment Edited] (ACCUMULO-4561) Crash when using ping on a non-existing server

[jira] [Created] (ACCUMULO-4721) Document rfile-info in the user manual

23 matches

Site Navigation

Mail list logo

Footer information