[jira] [Commented] (ACCUMULO-4331) Make port configuration and allocation consistent across services

2016-06-07 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15318900#comment-15318900
 ] 

Dave Marion commented on ACCUMULO-4331:
---

It makes it hard to lock ports down on a server (iptables)

> Make port configuration and allocation consistent across services
> -
>
> Key: ACCUMULO-4331
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4331
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Dave Marion
> Fix For: 1.8.0
>
>
> There was some discussion in ACCUMULO-4328 about ports, so I decided to track 
> down how the client ports are configured and allocated. Issues raised in the 
> discussion were:
>  1. The port search feature was not well understood
>  2. Ephemeral port allocation makes it hard to lock servers down (e.g. 
> iptables)
> Looking through the code I found the following properties allocate a port 
> number based on conf.getPort(). This returns the port number based on the 
> property and supports either a single value or zero. Then, in the server 
> component (monitor, tracer, gc, etc) this value is used when creating a 
> ServerSocket. If the port is already in use, the process will fail.
> {noformat}
> monitor.port.log4j
> trace.port.client
> gc.port.client
> monitor.port.client
> {noformat}
> The following properties use TServerUtils.startServer which uses the value in 
> the property to start the TServer. If the value is zero, then it picks a 
> random port between 1024 and 65535. If tserver.port.search is enabled, then 
> it will try a thousand times to bind to a random port.
> {noformat}
> tserver.port.client
> master.port.client
> master.replication.coordinator.port
> replication.receipt.service.port
> {noformat}
> I'm proposing that we deprecate the tserver.port.search property and the 
> value zero in the property value for the properties above. Instead, I think 
> we should allow the user to specify a single value or a range (M-N). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4331) Make port configuration and allocation consistent across services

2016-06-07 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15318830#comment-15318830
 ] 

Dave Marion commented on ACCUMULO-4331:
---

It couldn't be removed until 2.0. Does that change things?

> Make port configuration and allocation consistent across services
> -
>
> Key: ACCUMULO-4331
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4331
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Dave Marion
> Fix For: 1.8.0
>
>
> There was some discussion in ACCUMULO-4328 about ports, so I decided to track 
> down how the client ports are configured and allocated. Issues raised in the 
> discussion were:
>  1. The port search feature was not well understood
>  2. Ephemeral port allocation makes it hard to lock servers down (e.g. 
> iptables)
> Looking through the code I found the following properties allocate a port 
> number based on conf.getPort(). This returns the port number based on the 
> property and supports either a single value or zero. Then, in the server 
> component (monitor, tracer, gc, etc) this value is used when creating a 
> ServerSocket. If the port is already in use, the process will fail.
> {noformat}
> monitor.port.log4j
> trace.port.client
> gc.port.client
> monitor.port.client
> {noformat}
> The following properties use TServerUtils.startServer which uses the value in 
> the property to start the TServer. If the value is zero, then it picks a 
> random port between 1024 and 65535. If tserver.port.search is enabled, then 
> it will try a thousand times to bind to a random port.
> {noformat}
> tserver.port.client
> master.port.client
> master.replication.coordinator.port
> replication.receipt.service.port
> {noformat}
> I'm proposing that we deprecate the tserver.port.search property and the 
> value zero in the property value for the properties above. Instead, I think 
> we should allow the user to specify a single value or a range (M-N). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4329) TabletServer won't start if it cannot reserve replication port

2016-06-06 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317144#comment-15317144
 ] 

Dave Marion commented on ACCUMULO-4329:
---

Yes.

> TabletServer won't start if it cannot reserve replication port
> --
>
> Key: ACCUMULO-4329
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4329
> Project: Accumulo
>  Issue Type: Bug
>  Components: tserver
>Affects Versions: 1.8.0
>Reporter: Dave Marion
>
> TabletServer will not start if it cannot reserve the configured replication 
> port. The code passes `null` for enabling the port search feature, which 
> resolves to `false`, and the TServer fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (ACCUMULO-4329) TabletServer won't start if it cannot reserve replication port

2016-06-06 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317137#comment-15317137
 ] 

Dave Marion edited comment on ACCUMULO-4329 at 6/6/16 8:11 PM:
---

Let me walk you through what I am seeing, maybe its me. TabletServer has the 
following starting at line 2284:

{noformat}
ServerAddress sp = TServerUtils.startServer(this, 
clientAddress.getHostText(), Property.REPLICATION_RECEIPT_SERVICE_PORT, 
processor,
"ReplicationServicerHandler", "Replication Servicer", null, 
Property.REPLICATION_MIN_THREADS, Property.REPLICATION_THREADCHECK, 
maxMessageSizeProperty);
{noformat}

TServerUtils.startServer is defined as:
{noformat}
  public static ServerAddress startServer(AccumuloServerContext service, String 
hostname, Property portHintProperty, TProcessor processor, String serverName,
  String threadName, Property portSearchProperty, Property 
minThreadProperty, Property timeBetweenThreadChecksProperty, Property 
maxMessageSizeProperty)
{noformat}

So, TServerUtils is expecting a property name for the port search property in 
the 7th argument. In the code above for the TabletServer, the value `null` is 
being passed in the 7th argument to the function. Inside of 
TServerUtils.startServer it uses the property like so:

{noformat}
boolean portSearch = false;
if (portSearchProperty != null)
  portSearch = config.getBoolean(portSearchProperty);
{noformat}


was (Author: dlmarion):
Let me walk you through what I am seeing, maybe its me. TabletServer has the 
following starting at line 2284:

```
ServerAddress sp = TServerUtils.startServer(this, 
clientAddress.getHostText(), Property.REPLICATION_RECEIPT_SERVICE_PORT, 
processor,
"ReplicationServicerHandler", "Replication Servicer", null, 
Property.REPLICATION_MIN_THREADS, Property.REPLICATION_THREADCHECK, 
maxMessageSizeProperty);
```

TServerUtils.startServer is defined as:
```
  public static ServerAddress startServer(AccumuloServerContext service, String 
hostname, Property portHintProperty, TProcessor processor, String serverName,
  String threadName, Property portSearchProperty, Property 
minThreadProperty, Property timeBetweenThreadChecksProperty, Property 
maxMessageSizeProperty)
```

So, TServerUtils is expecting a property name for the port search property in 
the 7th argument. In the code above for the TabletServer, the value `null` is 
being passed in the 7th argument to the function. Inside of 
TServerUtils.startServer it uses the property like so:

```
boolean portSearch = false;
if (portSearchProperty != null)
  portSearch = config.getBoolean(portSearchProperty);
```

> TabletServer won't start if it cannot reserve replication port
> --
>
> Key: ACCUMULO-4329
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4329
> Project: Accumulo
>  Issue Type: Bug
>  Components: tserver
>Affects Versions: 1.8.0
>Reporter: Dave Marion
>
> TabletServer will not start if it cannot reserve the configured replication 
> port. The code passes `null` for enabling the port search feature, which 
> resolves to `false`, and the TServer fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (ACCUMULO-4329) TabletServer won't start if it cannot reserve replication port

2016-06-06 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317118#comment-15317118
 ] 

Dave Marion edited comment on ACCUMULO-4329 at 6/6/16 8:12 PM:
---

No, the port search works when enabled. I get the following in the monitor log:

{noformat}
Unable to start TServer
org.apache.thrift.transport.TTransportException: Could not create 
ServerSocket on address localhost/127.0.0.1:10002.
at 
org.apache.thrift.transport.TNonblockingServerSocket.(TNonblockingServerSocket.java:96)
at 
org.apache.thrift.transport.TNonblockingServerSocket.(TNonblockingServerSocket.java:79)
at 
org.apache.thrift.transport.TNonblockingServerSocket.(TNonblockingServerSocket.java:75)
at 
org.apache.accumulo.server.rpc.TServerUtils.createNonBlockingServer(TServerUtils.java:182)
at 
org.apache.accumulo.server.rpc.TServerUtils.startTServer(TServerUtils.java:514)
at 
org.apache.accumulo.server.rpc.TServerUtils.startTServer(TServerUtils.java:478)
at 
org.apache.accumulo.server.rpc.TServerUtils.startServer(TServerUtils.java:154)
at 
org.apache.accumulo.tserver.TabletServer.startReplicationService(TabletServer.java:2284)
at 
org.apache.accumulo.tserver.TabletServer.run(TabletServer.java:2437)
at 
org.apache.accumulo.tserver.TabletServer.main(TabletServer.java:2923)
at 
org.apache.accumulo.tserver.TServerExecutable.execute(TServerExecutable.java:33)
at org.apache.accumulo.start.Main$1.run(Main.java:93)
at java.lang.Thread.run(Thread.java:745)
{noformat}

followed by:

{noformat}
Uncaught exception in TabletServer.main, exiting
java.lang.RuntimeException: Failed to start replication service
at 
org.apache.accumulo.tserver.TabletServer.run(TabletServer.java:2439)
at 
org.apache.accumulo.tserver.TabletServer.main(TabletServer.java:2923)
at 
org.apache.accumulo.tserver.TServerExecutable.execute(TServerExecutable.java:33)
at org.apache.accumulo.start.Main$1.run(Main.java:93)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.UnknownHostException: Unable to find a listen port
at 
org.apache.accumulo.server.rpc.TServerUtils.startServer(TServerUtils.java:173)
at 
org.apache.accumulo.tserver.TabletServer.startReplicationService(TabletServer.java:2284)
at 
org.apache.accumulo.tserver.TabletServer.run(TabletServer.java:2437)
... 4 more
{noformat}


was (Author: dlmarion):
No, the port search works when enabled. I get the following in the monitor log:

```
Unable to start TServer
org.apache.thrift.transport.TTransportException: Could not create 
ServerSocket on address localhost/127.0.0.1:10002.
at 
org.apache.thrift.transport.TNonblockingServerSocket.(TNonblockingServerSocket.java:96)
at 
org.apache.thrift.transport.TNonblockingServerSocket.(TNonblockingServerSocket.java:79)
at 
org.apache.thrift.transport.TNonblockingServerSocket.(TNonblockingServerSocket.java:75)
at 
org.apache.accumulo.server.rpc.TServerUtils.createNonBlockingServer(TServerUtils.java:182)
at 
org.apache.accumulo.server.rpc.TServerUtils.startTServer(TServerUtils.java:514)
at 
org.apache.accumulo.server.rpc.TServerUtils.startTServer(TServerUtils.java:478)
at 
org.apache.accumulo.server.rpc.TServerUtils.startServer(TServerUtils.java:154)
at 
org.apache.accumulo.tserver.TabletServer.startReplicationService(TabletServer.java:2284)
at 
org.apache.accumulo.tserver.TabletServer.run(TabletServer.java:2437)
at 
org.apache.accumulo.tserver.TabletServer.main(TabletServer.java:2923)
at 
org.apache.accumulo.tserver.TServerExecutable.execute(TServerExecutable.java:33)
at org.apache.accumulo.start.Main$1.run(Main.java:93)
at java.lang.Thread.run(Thread.java:745)
```

followed by:

```
Uncaught exception in TabletServer.main, exiting
java.lang.RuntimeException: Failed to start replication service
at 
org.apache.accumulo.tserver.TabletServer.run(TabletServer.java:2439)
at 
org.apache.accumulo.tserver.TabletServer.main(TabletServer.java:2923)
at 
org.apache.accumulo.tserver.TServerExecutable.execute(TServerExecutable.java:33)
at org.apache.accumulo.start.Main$1.run(Main.java:93)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.UnknownHostException: Unable to find a listen port
at 

[jira] [Commented] (ACCUMULO-4329) Replication services don't have property to enable port-search

2016-06-06 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317146#comment-15317146
 ] 

Dave Marion commented on ACCUMULO-4329:
---

bq. you would need to set different ports by hand for replication

That seems inconsistent with what already exists for the TabletServer. Should 
it not re-use the tserver.port.search property?

> Replication services don't have property to enable port-search
> --
>
> Key: ACCUMULO-4329
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4329
> Project: Accumulo
>  Issue Type: Bug
>  Components: tserver
>Affects Versions: 1.8.0
>Reporter: Dave Marion
>
> TabletServer will not start if it cannot reserve the configured replication 
> port. The code passes `null` for enabling the port search feature, which 
> resolves to `false`, and the TServer fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3923) VFS ClassLoader doesnt' work with KeywordExecutable

2016-06-12 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15326496#comment-15326496
 ] 

Dave Marion commented on ACCUMULO-3923:
---

bq. It is not clear at all what general.vfs.classpaths value is supposed to be. 
A directory? A java classpath style wildcard? A regex for jars? 

>From the example in the 
>[blog|https://blogs.apache.org/accumulo/entry/the_accumulo_classloader]
{noformat}

general.vfs.classpaths
hdfs://localhost:8020/accumulo/system-classpath
Configuration for a system level vfs classloader. Accumulo 
jars can be configured here and loaded out of HDFS.

{noformat}

You should be able to do the following:
1. Untar an accumulo distribution somewhere
2. create the configuration files (bootstrap_config.sh)
3. Make appropriate changes to accumulo-env.sh and accumulo-site.xml, to 
include the property above
4. Run bootstrap_hdfs.sh, this will push most of the jars into the location 
specified in the general.vfs.classpaths property
5. Do what Christopher suggested 
[above|https://issues.apache.org/jira/browse/ACCUMULO-3923?focusedCommentId=15308710=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15308710]


> VFS ClassLoader doesnt' work with KeywordExecutable
> ---
>
> Key: ACCUMULO-3923
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3923
> Project: Accumulo
>  Issue Type: Bug
>Reporter: Josh Elser
>Assignee: Dave Marion
>Priority: Critical
> Fix For: 1.7.3, 1.8.0
>
>
> Trying to make the VFS classloading stuff work and it doesn't seem like 
> ServiceLoader is finding any of the KeywordExecutable implementations.
> Best I can tell after looking into this, VFSClassLoader (created by 
> AccumuloVFSClassLoader) has all of the jars listed as resources, but when 
> ServiceLoader tries to find the META-INF/services definitions, it returns 
> nothing, and thus we think the keyword must be a class name. Seems like a 
> commons-vfs bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (ACCUMULO-3923) VFS ClassLoader doesnt' work with KeywordExecutable

2016-06-12 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15326496#comment-15326496
 ] 

Dave Marion edited comment on ACCUMULO-3923 at 6/12/16 3:54 PM:


bq. It is not clear at all what general.vfs.classpaths value is supposed to be. 
A directory? A java classpath style wildcard? A regex for jars? 

>From the "Running Accumulo From HDFS" section in the 
>[blog|https://blogs.apache.org/accumulo/entry/the_accumulo_classloader]
{noformat}

general.vfs.classpaths
hdfs://localhost:8020/accumulo/system-classpath
Configuration for a system level vfs classloader. Accumulo 
jars can be configured here and loaded out of HDFS.

{noformat}

You should be able to do the following:
1. Untar an accumulo distribution somewhere
2. create the configuration files (bootstrap_config.sh)
3. Make appropriate changes to accumulo-env.sh and accumulo-site.xml, to 
include the property above
4. Run bootstrap_hdfs.sh, this will push most of the jars into the location 
specified in the general.vfs.classpaths property
5. Do what Christopher suggested 
[above|https://issues.apache.org/jira/browse/ACCUMULO-3923?focusedCommentId=15308710=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15308710]



was (Author: dlmarion):
bq. It is not clear at all what general.vfs.classpaths value is supposed to be. 
A directory? A java classpath style wildcard? A regex for jars? 

>From the example in the 
>[blog|https://blogs.apache.org/accumulo/entry/the_accumulo_classloader]
{noformat}

general.vfs.classpaths
hdfs://localhost:8020/accumulo/system-classpath
Configuration for a system level vfs classloader. Accumulo 
jars can be configured here and loaded out of HDFS.

{noformat}

You should be able to do the following:
1. Untar an accumulo distribution somewhere
2. create the configuration files (bootstrap_config.sh)
3. Make appropriate changes to accumulo-env.sh and accumulo-site.xml, to 
include the property above
4. Run bootstrap_hdfs.sh, this will push most of the jars into the location 
specified in the general.vfs.classpaths property
5. Do what Christopher suggested 
[above|https://issues.apache.org/jira/browse/ACCUMULO-3923?focusedCommentId=15308710=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15308710]


> VFS ClassLoader doesnt' work with KeywordExecutable
> ---
>
> Key: ACCUMULO-3923
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3923
> Project: Accumulo
>  Issue Type: Bug
>Reporter: Josh Elser
>Assignee: Dave Marion
>Priority: Critical
> Fix For: 1.7.3, 1.8.0
>
>
> Trying to make the VFS classloading stuff work and it doesn't seem like 
> ServiceLoader is finding any of the KeywordExecutable implementations.
> Best I can tell after looking into this, VFSClassLoader (created by 
> AccumuloVFSClassLoader) has all of the jars listed as resources, but when 
> ServiceLoader tries to find the META-INF/services definitions, it returns 
> nothing, and thus we think the keyword must be a class name. Seems like a 
> commons-vfs bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4328) Run multiple tablet servers on a single host

2016-06-03 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15315257#comment-15315257
 ] 

Dave Marion commented on ACCUMULO-4328:
---

bq. ...with the caveat that going from 1.6.x to 1.6.6 with this capability does 
not change any behavior if the feature is not enabled. And that would include 
needing to modify a current user script in any way.

I believe this to be the case, that you should not have to change anything.

bq. Random ports bother me. If there is an option to assign a port, or default 
to a random one if you don't care, okay - if I can only use a random port, that 
seems to be a problem if we try to be more container friendly as well as port 
blocking considerations.

I agree that random ports may be an issue, but its an issue that predates this 
JIRA. I feel that if we are going to modify tserver.port.client to take a list 
or range of values instead of 1 value for the purposes that have been stated, 
then that should be a separate JIRA. However, for the purposes of this issue, 
if someone wants to use this feature, then they are also choosing to deal with 
random ports.

bq. The big question that I have is, even if you could do this, would you want 
to?

I don't have any specific evidence to share, but in talking with [~kturner] he 
expressed an interest in this feature. Maybe he can give more specifics. I will 
say that one way to scale Accumulo is to add more tablet servers, and just 
because you have saturated one tablet server does not mean that you have 
saturated the disk or network bandwidth of the host on which it's running.

> Run multiple tablet servers on a single host
> 
>
> Key: ACCUMULO-4328
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4328
> Project: Accumulo
>  Issue Type: Improvement
>  Components: scripts, tserver
>Reporter: Dave Marion
>Assignee: Dave Marion
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Modify scripts and necessary code to run multiple tablet servers on a single 
> host.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3923) VFS ClassLoader doesnt' work with KeywordExecutable

2016-06-13 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15327631#comment-15327631
 ] 

Dave Marion commented on ACCUMULO-3923:
---

bq. Oh, is this property not also used by the Java ClassLoader? Maybe that was 
part of my confusion.

The property is used by both the bootstrap_hdfs.sh script and the VFS 
classloader. My changes to bootstrap_hdfs.sh are in the parsing of the property 
from accumulo-site.xml.

> VFS ClassLoader doesnt' work with KeywordExecutable
> ---
>
> Key: ACCUMULO-3923
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3923
> Project: Accumulo
>  Issue Type: Bug
>Reporter: Josh Elser
>Assignee: Dave Marion
>Priority: Critical
> Fix For: 1.7.3, 1.8.0
>
>
> Trying to make the VFS classloading stuff work and it doesn't seem like 
> ServiceLoader is finding any of the KeywordExecutable implementations.
> Best I can tell after looking into this, VFSClassLoader (created by 
> AccumuloVFSClassLoader) has all of the jars listed as resources, but when 
> ServiceLoader tries to find the META-INF/services definitions, it returns 
> nothing, and thus we think the keyword must be a class name. Seems like a 
> commons-vfs bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3923) VFS ClassLoader doesnt' work with KeywordExecutable

2016-06-13 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15327684#comment-15327684
 ] 

Dave Marion commented on ACCUMULO-3923:
---

Based on my testing, I would say that the issue with VFS-500 is resolved. There 
is a new issue with the ServiceLoader (ACCUMULO-4341), but I believe the 
problems with this JIRA are resolved. 

> VFS ClassLoader doesnt' work with KeywordExecutable
> ---
>
> Key: ACCUMULO-3923
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3923
> Project: Accumulo
>  Issue Type: Bug
>Reporter: Josh Elser
>Assignee: Dave Marion
>Priority: Critical
> Fix For: 1.7.3, 1.8.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Trying to make the VFS classloading stuff work and it doesn't seem like 
> ServiceLoader is finding any of the KeywordExecutable implementations.
> Best I can tell after looking into this, VFSClassLoader (created by 
> AccumuloVFSClassLoader) has all of the jars listed as resources, but when 
> ServiceLoader tries to find the META-INF/services definitions, it returns 
> nothing, and thus we think the keyword must be a class name. Seems like a 
> commons-vfs bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4341) ServiceLoader deadlock with classes loaded from HDFS

2016-06-13 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15327728#comment-15327728
 ] 

Dave Marion commented on ACCUMULO-4341:
---

I can't even `accumulo init` with the jars in HDFS.

> ServiceLoader deadlock with classes loaded from HDFS
> 
>
> Key: ACCUMULO-4341
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4341
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Affects Versions: 1.8.0
>Reporter: Dave Marion
>
> With Accumulo set up to use general.vfs.classpaths to load classes from HDFS, 
> running `accumulo help` will hang. 
> A jstack of the process shows the IPC Client thread at:
> {noformat}
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:348)
>   at 
> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2051)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:91)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>   at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1086)
>   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:966)
> {noformat}
> and the main thread at:
> {noformat}
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:502)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1454)
>   - locked <0xf09a2898> (a org.apache.hadoop.ipc.Client$Call)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
>   at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1982)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1128)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1124)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1124)
>   at 
> org.apache.commons.vfs2.provider.hdfs.HdfsFileObject.doAttach(HdfsFileObject.java:85)
>   at 
> org.apache.commons.vfs2.provider.AbstractFileObject.attach(AbstractFileObject.java:173)
>   - locked <0xf57fd008> (a 
> org.apache.commons.vfs2.provider.hdfs.HdfsFileSystem)
>   at 
> org.apache.commons.vfs2.provider.AbstractFileObject.getContent(AbstractFileObject.java:1236)
>   - locked <0xf57fd008> (a 
> org.apache.commons.vfs2.provider.hdfs.HdfsFileSystem)
>   at 
> org.apache.commons.vfs2.impl.VFSClassLoader.getPermissions(VFSClassLoader.java:300)
>   at 
> java.security.SecureClassLoader.getProtectionDomain(SecureClassLoader.java:206)
>   - locked <0xf5ad9138> (a java.util.HashMap)
>   at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>   at 
> org.apache.commons.vfs2.impl.VFSClassLoader.defineClass(VFSClassLoader.java:226)
>   at 
> org.apache.commons.vfs2.impl.VFSClassLoader.findClass(VFSClassLoader.java:180)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   - locked <0xf5af3b88> (a 
> org.apache.commons.vfs2.impl.VFSClassLoader)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
>   - locked <0xf6f5c2f8> (a 
> org.apache.commons.vfs2.impl.VFSClassLoader)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:348)
>   at 
> java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:370)
>   at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
>   at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
>   at 

[jira] [Commented] (ACCUMULO-3923) VFS ClassLoader doesnt' work with KeywordExecutable

2016-06-13 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15327632#comment-15327632
 ] 

Dave Marion commented on ACCUMULO-3923:
---

I pushed my changes to the script.

> VFS ClassLoader doesnt' work with KeywordExecutable
> ---
>
> Key: ACCUMULO-3923
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3923
> Project: Accumulo
>  Issue Type: Bug
>Reporter: Josh Elser
>Assignee: Dave Marion
>Priority: Critical
> Fix For: 1.7.3, 1.8.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Trying to make the VFS classloading stuff work and it doesn't seem like 
> ServiceLoader is finding any of the KeywordExecutable implementations.
> Best I can tell after looking into this, VFSClassLoader (created by 
> AccumuloVFSClassLoader) has all of the jars listed as resources, but when 
> ServiceLoader tries to find the META-INF/services definitions, it returns 
> nothing, and thus we think the keyword must be a class name. Seems like a 
> commons-vfs bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3923) VFS ClassLoader doesnt' work with KeywordExecutable

2016-06-13 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15327650#comment-15327650
 ] 

Dave Marion commented on ACCUMULO-3923:
---

Yes, it should.

> VFS ClassLoader doesnt' work with KeywordExecutable
> ---
>
> Key: ACCUMULO-3923
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3923
> Project: Accumulo
>  Issue Type: Bug
>Reporter: Josh Elser
>Assignee: Dave Marion
>Priority: Critical
> Fix For: 1.7.3, 1.8.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Trying to make the VFS classloading stuff work and it doesn't seem like 
> ServiceLoader is finding any of the KeywordExecutable implementations.
> Best I can tell after looking into this, VFSClassLoader (created by 
> AccumuloVFSClassLoader) has all of the jars listed as resources, but when 
> ServiceLoader tries to find the META-INF/services definitions, it returns 
> nothing, and thus we think the keyword must be a class name. Seems like a 
> commons-vfs bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4341) ServiceLoader deadlock with classes loaded from HDFS

2016-06-13 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15327763#comment-15327763
 ] 

Dave Marion commented on ACCUMULO-4341:
---

Loading classes out of HDFS has been a feature supported from 1.5 with the new 
classloader. The integration of the ServiceLoader was introduced in 1.7.0. 

I didn't realize it until now, but I am loading jars out of HDFS with 1.7.0 
with the ServiceLoader. In this case, where it is working, I am *only* using 
putting my application jars into HDFS and setting the context name on tables. 
Pushing the accumulo jars into HDFS likely the case where it will not work.

> ServiceLoader deadlock with classes loaded from HDFS
> 
>
> Key: ACCUMULO-4341
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4341
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Affects Versions: 1.8.0
>Reporter: Dave Marion
>
> With Accumulo set up to use general.vfs.classpaths to load classes from HDFS, 
> running `accumulo help` will hang. 
> A jstack of the process shows the IPC Client thread at:
> {noformat}
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:348)
>   at 
> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2051)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:91)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>   at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1086)
>   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:966)
> {noformat}
> and the main thread at:
> {noformat}
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:502)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1454)
>   - locked <0xf09a2898> (a org.apache.hadoop.ipc.Client$Call)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
>   at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1982)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1128)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1124)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1124)
>   at 
> org.apache.commons.vfs2.provider.hdfs.HdfsFileObject.doAttach(HdfsFileObject.java:85)
>   at 
> org.apache.commons.vfs2.provider.AbstractFileObject.attach(AbstractFileObject.java:173)
>   - locked <0xf57fd008> (a 
> org.apache.commons.vfs2.provider.hdfs.HdfsFileSystem)
>   at 
> org.apache.commons.vfs2.provider.AbstractFileObject.getContent(AbstractFileObject.java:1236)
>   - locked <0xf57fd008> (a 
> org.apache.commons.vfs2.provider.hdfs.HdfsFileSystem)
>   at 
> org.apache.commons.vfs2.impl.VFSClassLoader.getPermissions(VFSClassLoader.java:300)
>   at 
> java.security.SecureClassLoader.getProtectionDomain(SecureClassLoader.java:206)
>   - locked <0xf5ad9138> (a java.util.HashMap)
>   at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>   at 
> org.apache.commons.vfs2.impl.VFSClassLoader.defineClass(VFSClassLoader.java:226)
>   at 
> org.apache.commons.vfs2.impl.VFSClassLoader.findClass(VFSClassLoader.java:180)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   - locked <0xf5af3b88> (a 
> org.apache.commons.vfs2.impl.VFSClassLoader)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
>   - locked <0xf6f5c2f8> (a 
> 

[jira] [Commented] (ACCUMULO-4341) ServiceLoader deadlock with classes loaded from HDFS

2016-06-13 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15327854#comment-15327854
 ] 

Dave Marion commented on ACCUMULO-4341:
---

I'm wondering if this is really due to a change in the ClassLoader object 
between Java 6 and 7. If you look at the javadoc for the two-arg form of 
CLassLoader.loadClass(), Java 7 introduced a lock and parallel capable class 
loaders.

> ServiceLoader deadlock with classes loaded from HDFS
> 
>
> Key: ACCUMULO-4341
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4341
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Affects Versions: 1.8.0
>Reporter: Dave Marion
>Priority: Blocker
> Fix For: 1.7.2, 1.8.0
>
>
> With Accumulo set up to use general.vfs.classpaths to load classes from HDFS, 
> running `accumulo help` will hang. 
> A jstack of the process shows the IPC Client thread at:
> {noformat}
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:348)
>   at 
> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2051)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:91)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>   at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1086)
>   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:966)
> {noformat}
> and the main thread at:
> {noformat}
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:502)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1454)
>   - locked <0xf09a2898> (a org.apache.hadoop.ipc.Client$Call)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
>   at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1982)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1128)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1124)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1124)
>   at 
> org.apache.commons.vfs2.provider.hdfs.HdfsFileObject.doAttach(HdfsFileObject.java:85)
>   at 
> org.apache.commons.vfs2.provider.AbstractFileObject.attach(AbstractFileObject.java:173)
>   - locked <0xf57fd008> (a 
> org.apache.commons.vfs2.provider.hdfs.HdfsFileSystem)
>   at 
> org.apache.commons.vfs2.provider.AbstractFileObject.getContent(AbstractFileObject.java:1236)
>   - locked <0xf57fd008> (a 
> org.apache.commons.vfs2.provider.hdfs.HdfsFileSystem)
>   at 
> org.apache.commons.vfs2.impl.VFSClassLoader.getPermissions(VFSClassLoader.java:300)
>   at 
> java.security.SecureClassLoader.getProtectionDomain(SecureClassLoader.java:206)
>   - locked <0xf5ad9138> (a java.util.HashMap)
>   at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>   at 
> org.apache.commons.vfs2.impl.VFSClassLoader.defineClass(VFSClassLoader.java:226)
>   at 
> org.apache.commons.vfs2.impl.VFSClassLoader.findClass(VFSClassLoader.java:180)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   - locked <0xf5af3b88> (a 
> org.apache.commons.vfs2.impl.VFSClassLoader)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
>   - locked <0xf6f5c2f8> (a 
> org.apache.commons.vfs2.impl.VFSClassLoader)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at java.lang.Class.forName0(Native Method)
>   at 

[jira] [Updated] (ACCUMULO-3923) bootstrap_hdfs.sh does not copy correct jars to hdfs

2016-06-14 Thread Dave Marion (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Marion updated ACCUMULO-3923:
--
Fix Version/s: 1.7.2

>  bootstrap_hdfs.sh does not copy correct jars to hdfs
> -
>
> Key: ACCUMULO-3923
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3923
> Project: Accumulo
>  Issue Type: Bug
>Reporter: Josh Elser
>Assignee: Dave Marion
>Priority: Critical
> Fix For: 1.7.2, 1.8.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Trying to make the VFS classloading stuff work and it doesn't seem like 
> ServiceLoader is finding any of the KeywordExecutable implementations.
> Best I can tell after looking into this, VFSClassLoader (created by 
> AccumuloVFSClassLoader) has all of the jars listed as resources, but when 
> ServiceLoader tries to find the META-INF/services definitions, it returns 
> nothing, and thus we think the keyword must be a class name. Seems like a 
> commons-vfs bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (ACCUMULO-4341) ServiceLoader deadlock with classes loaded from HDFS

2016-06-14 Thread Dave Marion (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Marion resolved ACCUMULO-4341.
---
Resolution: Fixed

> ServiceLoader deadlock with classes loaded from HDFS
> 
>
> Key: ACCUMULO-4341
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4341
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Affects Versions: 1.7.0, 1.8.0
>Reporter: Dave Marion
>Assignee: Dave Marion
>Priority: Blocker
> Fix For: 1.7.2, 1.8.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> With Accumulo set up to use general.vfs.classpaths to load classes from HDFS, 
> running `accumulo help` will hang. 
> A jstack of the process shows the IPC Client thread at:
> {noformat}
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:348)
>   at 
> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2051)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:91)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>   at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1086)
>   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:966)
> {noformat}
> and the main thread at:
> {noformat}
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:502)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1454)
>   - locked <0xf09a2898> (a org.apache.hadoop.ipc.Client$Call)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
>   at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1982)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1128)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1124)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1124)
>   at 
> org.apache.commons.vfs2.provider.hdfs.HdfsFileObject.doAttach(HdfsFileObject.java:85)
>   at 
> org.apache.commons.vfs2.provider.AbstractFileObject.attach(AbstractFileObject.java:173)
>   - locked <0xf57fd008> (a 
> org.apache.commons.vfs2.provider.hdfs.HdfsFileSystem)
>   at 
> org.apache.commons.vfs2.provider.AbstractFileObject.getContent(AbstractFileObject.java:1236)
>   - locked <0xf57fd008> (a 
> org.apache.commons.vfs2.provider.hdfs.HdfsFileSystem)
>   at 
> org.apache.commons.vfs2.impl.VFSClassLoader.getPermissions(VFSClassLoader.java:300)
>   at 
> java.security.SecureClassLoader.getProtectionDomain(SecureClassLoader.java:206)
>   - locked <0xf5ad9138> (a java.util.HashMap)
>   at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>   at 
> org.apache.commons.vfs2.impl.VFSClassLoader.defineClass(VFSClassLoader.java:226)
>   at 
> org.apache.commons.vfs2.impl.VFSClassLoader.findClass(VFSClassLoader.java:180)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   - locked <0xf5af3b88> (a 
> org.apache.commons.vfs2.impl.VFSClassLoader)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
>   - locked <0xf6f5c2f8> (a 
> org.apache.commons.vfs2.impl.VFSClassLoader)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:348)
>   at 
> java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:370)
>   at 

[jira] [Commented] (ACCUMULO-4341) ServiceLoader deadlock with classes loaded from HDFS

2016-06-14 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329491#comment-15329491
 ] 

Dave Marion commented on ACCUMULO-4341:
---

Backported ACCUMULO-3923 to 1.7.2-SNAPSHOT and applied this fix to the same 
branch. Tested using the following process:

{noformat}
1. tar zxf accumulo-1.7.2-SNAPSHOT-bin.tar.gz
2. cd accumulo-1.7.2-SNAPSHOT/bin
3. ./build_native_library.sh
4. ./bootstrap_config.sh
5. Update accumulo-env.sh
6. In accumulo-site.xml, set instance.volumes and add:

  
general.vfs.classpaths

hdfs://:/accumulo-1.7.2-SNAPSHOT-system-classpath/.*.jar
  

7. ./bootstrap_hdfs.sh
8. accumulo init
9. start-all.sh
{noformat}


> ServiceLoader deadlock with classes loaded from HDFS
> 
>
> Key: ACCUMULO-4341
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4341
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Affects Versions: 1.7.0, 1.8.0
>Reporter: Dave Marion
>Priority: Blocker
> Fix For: 1.7.2, 1.8.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> With Accumulo set up to use general.vfs.classpaths to load classes from HDFS, 
> running `accumulo help` will hang. 
> A jstack of the process shows the IPC Client thread at:
> {noformat}
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:348)
>   at 
> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2051)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:91)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>   at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1086)
>   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:966)
> {noformat}
> and the main thread at:
> {noformat}
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:502)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1454)
>   - locked <0xf09a2898> (a org.apache.hadoop.ipc.Client$Call)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
>   at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1982)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1128)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1124)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1124)
>   at 
> org.apache.commons.vfs2.provider.hdfs.HdfsFileObject.doAttach(HdfsFileObject.java:85)
>   at 
> org.apache.commons.vfs2.provider.AbstractFileObject.attach(AbstractFileObject.java:173)
>   - locked <0xf57fd008> (a 
> org.apache.commons.vfs2.provider.hdfs.HdfsFileSystem)
>   at 
> org.apache.commons.vfs2.provider.AbstractFileObject.getContent(AbstractFileObject.java:1236)
>   - locked <0xf57fd008> (a 
> org.apache.commons.vfs2.provider.hdfs.HdfsFileSystem)
>   at 
> org.apache.commons.vfs2.impl.VFSClassLoader.getPermissions(VFSClassLoader.java:300)
>   at 
> java.security.SecureClassLoader.getProtectionDomain(SecureClassLoader.java:206)
>   - locked <0xf5ad9138> (a java.util.HashMap)
>   at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>   at 
> org.apache.commons.vfs2.impl.VFSClassLoader.defineClass(VFSClassLoader.java:226)
>   at 
> org.apache.commons.vfs2.impl.VFSClassLoader.findClass(VFSClassLoader.java:180)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   - locked <0xf5af3b88> (a 
> 

[jira] [Assigned] (ACCUMULO-4341) ServiceLoader deadlock with classes loaded from HDFS

2016-06-14 Thread Dave Marion (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Marion reassigned ACCUMULO-4341:
-

Assignee: Dave Marion

> ServiceLoader deadlock with classes loaded from HDFS
> 
>
> Key: ACCUMULO-4341
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4341
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Affects Versions: 1.7.0, 1.8.0
>Reporter: Dave Marion
>Assignee: Dave Marion
>Priority: Blocker
> Fix For: 1.7.2, 1.8.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> With Accumulo set up to use general.vfs.classpaths to load classes from HDFS, 
> running `accumulo help` will hang. 
> A jstack of the process shows the IPC Client thread at:
> {noformat}
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:348)
>   at 
> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2051)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:91)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>   at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1086)
>   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:966)
> {noformat}
> and the main thread at:
> {noformat}
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:502)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1454)
>   - locked <0xf09a2898> (a org.apache.hadoop.ipc.Client$Call)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
>   at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1982)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1128)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1124)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1124)
>   at 
> org.apache.commons.vfs2.provider.hdfs.HdfsFileObject.doAttach(HdfsFileObject.java:85)
>   at 
> org.apache.commons.vfs2.provider.AbstractFileObject.attach(AbstractFileObject.java:173)
>   - locked <0xf57fd008> (a 
> org.apache.commons.vfs2.provider.hdfs.HdfsFileSystem)
>   at 
> org.apache.commons.vfs2.provider.AbstractFileObject.getContent(AbstractFileObject.java:1236)
>   - locked <0xf57fd008> (a 
> org.apache.commons.vfs2.provider.hdfs.HdfsFileSystem)
>   at 
> org.apache.commons.vfs2.impl.VFSClassLoader.getPermissions(VFSClassLoader.java:300)
>   at 
> java.security.SecureClassLoader.getProtectionDomain(SecureClassLoader.java:206)
>   - locked <0xf5ad9138> (a java.util.HashMap)
>   at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>   at 
> org.apache.commons.vfs2.impl.VFSClassLoader.defineClass(VFSClassLoader.java:226)
>   at 
> org.apache.commons.vfs2.impl.VFSClassLoader.findClass(VFSClassLoader.java:180)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   - locked <0xf5af3b88> (a 
> org.apache.commons.vfs2.impl.VFSClassLoader)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
>   - locked <0xf6f5c2f8> (a 
> org.apache.commons.vfs2.impl.VFSClassLoader)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:348)
>   at 
> java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:370)
>   at 

[jira] [Updated] (ACCUMULO-4341) ServiceLoader deadlock with classes loaded from HDFS

2016-06-14 Thread Dave Marion (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Marion updated ACCUMULO-4341:
--
Affects Version/s: 1.7.0

> ServiceLoader deadlock with classes loaded from HDFS
> 
>
> Key: ACCUMULO-4341
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4341
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Affects Versions: 1.7.0, 1.8.0
>Reporter: Dave Marion
>Assignee: Dave Marion
>Priority: Blocker
> Fix For: 1.7.2, 1.8.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> With Accumulo set up to use general.vfs.classpaths to load classes from HDFS, 
> running `accumulo help` will hang. 
> A jstack of the process shows the IPC Client thread at:
> {noformat}
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:348)
>   at 
> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2051)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:91)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>   at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1086)
>   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:966)
> {noformat}
> and the main thread at:
> {noformat}
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:502)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1454)
>   - locked <0xf09a2898> (a org.apache.hadoop.ipc.Client$Call)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
>   at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1982)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1128)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1124)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1124)
>   at 
> org.apache.commons.vfs2.provider.hdfs.HdfsFileObject.doAttach(HdfsFileObject.java:85)
>   at 
> org.apache.commons.vfs2.provider.AbstractFileObject.attach(AbstractFileObject.java:173)
>   - locked <0xf57fd008> (a 
> org.apache.commons.vfs2.provider.hdfs.HdfsFileSystem)
>   at 
> org.apache.commons.vfs2.provider.AbstractFileObject.getContent(AbstractFileObject.java:1236)
>   - locked <0xf57fd008> (a 
> org.apache.commons.vfs2.provider.hdfs.HdfsFileSystem)
>   at 
> org.apache.commons.vfs2.impl.VFSClassLoader.getPermissions(VFSClassLoader.java:300)
>   at 
> java.security.SecureClassLoader.getProtectionDomain(SecureClassLoader.java:206)
>   - locked <0xf5ad9138> (a java.util.HashMap)
>   at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>   at 
> org.apache.commons.vfs2.impl.VFSClassLoader.defineClass(VFSClassLoader.java:226)
>   at 
> org.apache.commons.vfs2.impl.VFSClassLoader.findClass(VFSClassLoader.java:180)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   - locked <0xf5af3b88> (a 
> org.apache.commons.vfs2.impl.VFSClassLoader)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
>   - locked <0xf6f5c2f8> (a 
> org.apache.commons.vfs2.impl.VFSClassLoader)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:348)
>   at 
> java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:370)
>   at 

[jira] [Resolved] (ACCUMULO-4331) Make port configuration and allocation consistent across services

2016-06-14 Thread Dave Marion (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Marion resolved ACCUMULO-4331.
---
Resolution: Fixed

> Make port configuration and allocation consistent across services
> -
>
> Key: ACCUMULO-4331
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4331
> Project: Accumulo
>  Issue Type: Bug
>Reporter: Dave Marion
>Assignee: Dave Marion
> Fix For: 1.8.0
>
> Attachments: ACCUMULO-4331-1.patch
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> There was some discussion in ACCUMULO-4328 about ports, so I decided to track 
> down how the client ports are configured and allocated. Issues raised in the 
> discussion were:
>  1. The port search feature was not well understood
>  2. Ephemeral port allocation makes it hard to lock servers down (e.g. 
> iptables)
> Looking through the code I found the following properties allocate a port 
> number based on conf.getPort(). This returns the port number based on the 
> property and supports either a single value or zero. Then, in the server 
> component (monitor, tracer, gc, etc) this value is used when creating a 
> ServerSocket. If the port is already in use, the process will fail.
> {noformat}
> monitor.port.log4j
> trace.port.client
> gc.port.client
> monitor.port.client
> {noformat}
> The following properties use TServerUtils.startServer which uses the value in 
> the property to start the TServer. If the value is zero, then it picks a 
> random port between 1024 and 65535. If tserver.port.search is enabled, then 
> it will try a thousand times to bind to a random port.
> {noformat}
> tserver.port.client
> master.port.client
> master.replication.coordinator.port
> replication.receipt.service.port
> {noformat}
> I'm proposing that we deprecate the tserver.port.search property and the 
> value zero in the property value for the properties above. Instead, I think 
> we should allow the user to specify a single value or a range (M-N). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ACCUMULO-4331) Make port configuration and allocation consistent across services

2016-06-14 Thread Dave Marion (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Marion updated ACCUMULO-4331:
--
Attachment: ACCUMULO-4331-1.patch

> Make port configuration and allocation consistent across services
> -
>
> Key: ACCUMULO-4331
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4331
> Project: Accumulo
>  Issue Type: Bug
>Reporter: Dave Marion
>Assignee: Dave Marion
> Fix For: 1.8.0
>
> Attachments: ACCUMULO-4331-1.patch
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> There was some discussion in ACCUMULO-4328 about ports, so I decided to track 
> down how the client ports are configured and allocated. Issues raised in the 
> discussion were:
>  1. The port search feature was not well understood
>  2. Ephemeral port allocation makes it hard to lock servers down (e.g. 
> iptables)
> Looking through the code I found the following properties allocate a port 
> number based on conf.getPort(). This returns the port number based on the 
> property and supports either a single value or zero. Then, in the server 
> component (monitor, tracer, gc, etc) this value is used when creating a 
> ServerSocket. If the port is already in use, the process will fail.
> {noformat}
> monitor.port.log4j
> trace.port.client
> gc.port.client
> monitor.port.client
> {noformat}
> The following properties use TServerUtils.startServer which uses the value in 
> the property to start the TServer. If the value is zero, then it picks a 
> random port between 1024 and 65535. If tserver.port.search is enabled, then 
> it will try a thousand times to bind to a random port.
> {noformat}
> tserver.port.client
> master.port.client
> master.replication.coordinator.port
> replication.receipt.service.port
> {noformat}
> I'm proposing that we deprecate the tserver.port.search property and the 
> value zero in the property value for the properties above. Instead, I think 
> we should allow the user to specify a single value or a range (M-N). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ACCUMULO-4344) Fix PortRange property validation

2016-06-14 Thread Dave Marion (JIRA)
Dave Marion created ACCUMULO-4344:
-

 Summary: Fix PortRange property validation
 Key: ACCUMULO-4344
 URL: https://issues.apache.org/jira/browse/ACCUMULO-4344
 Project: Accumulo
  Issue Type: Bug
Reporter: Dave Marion
 Fix For: 1.8.0


PortRange.parse should not modify the ports, just ensure that they are valid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (ACCUMULO-4344) Fix PortRange property validation

2016-06-14 Thread Dave Marion (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Marion reassigned ACCUMULO-4344:
-

Assignee: Dave Marion

> Fix PortRange property validation
> -
>
> Key: ACCUMULO-4344
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4344
> Project: Accumulo
>  Issue Type: Bug
>Reporter: Dave Marion
>Assignee: Dave Marion
> Fix For: 1.8.0
>
>
> PortRange.parse should not modify the ports, just ensure that they are valid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (ACCUMULO-4344) Fix PortRange property validation

2016-06-14 Thread Dave Marion (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Marion resolved ACCUMULO-4344.
---
Resolution: Fixed

> Fix PortRange property validation
> -
>
> Key: ACCUMULO-4344
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4344
> Project: Accumulo
>  Issue Type: Bug
>Reporter: Dave Marion
>Assignee: Dave Marion
> Fix For: 1.8.0
>
>
> PortRange.parse should not modify the ports, just ensure that they are valid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3923) VFS ClassLoader doesnt' work with KeywordExecutable

2016-06-13 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15327618#comment-15327618
 ] 

Dave Marion commented on ACCUMULO-3923:
---

I have some changes locally to bootstrap_hdfs.sh so that it will support the 
following property values in general.vfs.classpaths:

hdfs://host:port/accumulo/classpath/
hdfs://host:port/accumulo/classpath/.*.jar
hdfs://host:port/accumulo/classpath/.*.jar,hdfs://host:port/accumulo/classpath2/.*.jar

In all cases, it will create the /accumulo/classpath if it does not exist and 
push the jars from the lib directory into the specified directory. I also found 
in my testing that the slf4j jars need to be kept on the local server. 
Additionally, when trying to run `accumulo help` to test these changes, I found 
that the client is in a deadlock situation. Ticket to follow.

> VFS ClassLoader doesnt' work with KeywordExecutable
> ---
>
> Key: ACCUMULO-3923
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3923
> Project: Accumulo
>  Issue Type: Bug
>Reporter: Josh Elser
>Assignee: Dave Marion
>Priority: Critical
> Fix For: 1.7.3, 1.8.0
>
>
> Trying to make the VFS classloading stuff work and it doesn't seem like 
> ServiceLoader is finding any of the KeywordExecutable implementations.
> Best I can tell after looking into this, VFSClassLoader (created by 
> AccumuloVFSClassLoader) has all of the jars listed as resources, but when 
> ServiceLoader tries to find the META-INF/services definitions, it returns 
> nothing, and thus we think the keyword must be a class name. Seems like a 
> commons-vfs bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4341) ServiceLoader deadlock with classes loaded from HDFS

2016-06-13 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15327720#comment-15327720
 ] 

Dave Marion commented on ACCUMULO-4341:
---

No, the service loader feature is busted.

> ServiceLoader deadlock with classes loaded from HDFS
> 
>
> Key: ACCUMULO-4341
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4341
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Affects Versions: 1.8.0
>Reporter: Dave Marion
>
> With Accumulo set up to use general.vfs.classpaths to load classes from HDFS, 
> running `accumulo help` will hang. 
> A jstack of the process shows the IPC Client thread at:
> {noformat}
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:348)
>   at 
> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2051)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:91)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>   at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1086)
>   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:966)
> {noformat}
> and the main thread at:
> {noformat}
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:502)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1454)
>   - locked <0xf09a2898> (a org.apache.hadoop.ipc.Client$Call)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
>   at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1982)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1128)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1124)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1124)
>   at 
> org.apache.commons.vfs2.provider.hdfs.HdfsFileObject.doAttach(HdfsFileObject.java:85)
>   at 
> org.apache.commons.vfs2.provider.AbstractFileObject.attach(AbstractFileObject.java:173)
>   - locked <0xf57fd008> (a 
> org.apache.commons.vfs2.provider.hdfs.HdfsFileSystem)
>   at 
> org.apache.commons.vfs2.provider.AbstractFileObject.getContent(AbstractFileObject.java:1236)
>   - locked <0xf57fd008> (a 
> org.apache.commons.vfs2.provider.hdfs.HdfsFileSystem)
>   at 
> org.apache.commons.vfs2.impl.VFSClassLoader.getPermissions(VFSClassLoader.java:300)
>   at 
> java.security.SecureClassLoader.getProtectionDomain(SecureClassLoader.java:206)
>   - locked <0xf5ad9138> (a java.util.HashMap)
>   at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>   at 
> org.apache.commons.vfs2.impl.VFSClassLoader.defineClass(VFSClassLoader.java:226)
>   at 
> org.apache.commons.vfs2.impl.VFSClassLoader.findClass(VFSClassLoader.java:180)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   - locked <0xf5af3b88> (a 
> org.apache.commons.vfs2.impl.VFSClassLoader)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
>   - locked <0xf6f5c2f8> (a 
> org.apache.commons.vfs2.impl.VFSClassLoader)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:348)
>   at 
> java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:370)
>   at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
>   at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
>   at 

[jira] [Commented] (ACCUMULO-4341) ServiceLoader deadlock with classes loaded from HDFS

2016-06-13 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15328335#comment-15328335
 ] 

Dave Marion commented on ACCUMULO-4341:
---

I think I have this resolved. It's all working now locally, to include running 
bootstrap_hdfs.sh and running Accumulo with the jars out of HDFS. I'm going to 
put up a small patch against 1.8.0. I didn't have time to test against 1.7. If 
someone gets to it before I do, feel free to apply the patch and close my PR.

> ServiceLoader deadlock with classes loaded from HDFS
> 
>
> Key: ACCUMULO-4341
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4341
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Affects Versions: 1.8.0
>Reporter: Dave Marion
>Priority: Blocker
> Fix For: 1.7.2, 1.8.0
>
>
> With Accumulo set up to use general.vfs.classpaths to load classes from HDFS, 
> running `accumulo help` will hang. 
> A jstack of the process shows the IPC Client thread at:
> {noformat}
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:348)
>   at 
> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2051)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:91)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>   at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1086)
>   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:966)
> {noformat}
> and the main thread at:
> {noformat}
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:502)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1454)
>   - locked <0xf09a2898> (a org.apache.hadoop.ipc.Client$Call)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
>   at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1982)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1128)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1124)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1124)
>   at 
> org.apache.commons.vfs2.provider.hdfs.HdfsFileObject.doAttach(HdfsFileObject.java:85)
>   at 
> org.apache.commons.vfs2.provider.AbstractFileObject.attach(AbstractFileObject.java:173)
>   - locked <0xf57fd008> (a 
> org.apache.commons.vfs2.provider.hdfs.HdfsFileSystem)
>   at 
> org.apache.commons.vfs2.provider.AbstractFileObject.getContent(AbstractFileObject.java:1236)
>   - locked <0xf57fd008> (a 
> org.apache.commons.vfs2.provider.hdfs.HdfsFileSystem)
>   at 
> org.apache.commons.vfs2.impl.VFSClassLoader.getPermissions(VFSClassLoader.java:300)
>   at 
> java.security.SecureClassLoader.getProtectionDomain(SecureClassLoader.java:206)
>   - locked <0xf5ad9138> (a java.util.HashMap)
>   at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>   at 
> org.apache.commons.vfs2.impl.VFSClassLoader.defineClass(VFSClassLoader.java:226)
>   at 
> org.apache.commons.vfs2.impl.VFSClassLoader.findClass(VFSClassLoader.java:180)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   - locked <0xf5af3b88> (a 
> org.apache.commons.vfs2.impl.VFSClassLoader)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
>   - locked <0xf6f5c2f8> (a 
> org.apache.commons.vfs2.impl.VFSClassLoader)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at 

[jira] [Comment Edited] (ACCUMULO-3923) VFS ClassLoader doesnt' work with KeywordExecutable

2016-06-13 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15327618#comment-15327618
 ] 

Dave Marion edited comment on ACCUMULO-3923 at 6/13/16 3:32 PM:


I have some changes locally to bootstrap_hdfs.sh so that it will support the 
following property values in general.vfs.classpaths:

hdfs://host:port/accumulo/classpath/
hdfs://host:port/accumulo/classpath/.*.jar
hdfs://host:port/accumulo/classpath/.\*.jar,hdfs://host:port/accumulo/classpath2/.*.jar

In all cases, it will create the /accumulo/classpath if it does not exist and 
push the jars from the lib directory into the specified directory. I also found 
in my testing that the slf4j jars need to be kept on the local server. 
Additionally, when trying to run `accumulo help` to test these changes, I found 
that the client is in a deadlock situation. Ticket to follow.


was (Author: dlmarion):
I have some changes locally to bootstrap_hdfs.sh so that it will support the 
following property values in general.vfs.classpaths:

hdfs://host:port/accumulo/classpath/
hdfs://host:port/accumulo/classpath/.*.jar
hdfs://host:port/accumulo/classpath/.*.jar,hdfs://host:port/accumulo/classpath2/.*.jar

In all cases, it will create the /accumulo/classpath if it does not exist and 
push the jars from the lib directory into the specified directory. I also found 
in my testing that the slf4j jars need to be kept on the local server. 
Additionally, when trying to run `accumulo help` to test these changes, I found 
that the client is in a deadlock situation. Ticket to follow.

> VFS ClassLoader doesnt' work with KeywordExecutable
> ---
>
> Key: ACCUMULO-3923
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3923
> Project: Accumulo
>  Issue Type: Bug
>Reporter: Josh Elser
>Assignee: Dave Marion
>Priority: Critical
> Fix For: 1.7.3, 1.8.0
>
>
> Trying to make the VFS classloading stuff work and it doesn't seem like 
> ServiceLoader is finding any of the KeywordExecutable implementations.
> Best I can tell after looking into this, VFSClassLoader (created by 
> AccumuloVFSClassLoader) has all of the jars listed as resources, but when 
> ServiceLoader tries to find the META-INF/services definitions, it returns 
> nothing, and thus we think the keyword must be a class name. Seems like a 
> commons-vfs bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ACCUMULO-4341) ServiceLoader deadlock with classes loaded from HDFS

2016-06-13 Thread Dave Marion (JIRA)
Dave Marion created ACCUMULO-4341:
-

 Summary: ServiceLoader deadlock with classes loaded from HDFS
 Key: ACCUMULO-4341
 URL: https://issues.apache.org/jira/browse/ACCUMULO-4341
 Project: Accumulo
  Issue Type: Bug
  Components: client
Affects Versions: 1.8.0
Reporter: Dave Marion


With Accumulo set up to use general.vfs.classpaths to load classes from HDFS, 
running `accumulo help` will hang. 

A jstack of the process shows the IPC Client thread at:
{noformat}
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at 
org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2051)
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:91)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at 
org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1086)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:966)
{noformat}

and the main thread at:

{noformat}
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at org.apache.hadoop.ipc.Client.call(Client.java:1454)
- locked <0xf09a2898> (a org.apache.hadoop.ipc.Client$Call)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1982)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1128)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1124)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1124)
at 
org.apache.commons.vfs2.provider.hdfs.HdfsFileObject.doAttach(HdfsFileObject.java:85)
at 
org.apache.commons.vfs2.provider.AbstractFileObject.attach(AbstractFileObject.java:173)
- locked <0xf57fd008> (a 
org.apache.commons.vfs2.provider.hdfs.HdfsFileSystem)
at 
org.apache.commons.vfs2.provider.AbstractFileObject.getContent(AbstractFileObject.java:1236)
- locked <0xf57fd008> (a 
org.apache.commons.vfs2.provider.hdfs.HdfsFileSystem)
at 
org.apache.commons.vfs2.impl.VFSClassLoader.getPermissions(VFSClassLoader.java:300)
at 
java.security.SecureClassLoader.getProtectionDomain(SecureClassLoader.java:206)
- locked <0xf5ad9138> (a java.util.HashMap)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at 
org.apache.commons.vfs2.impl.VFSClassLoader.defineClass(VFSClassLoader.java:226)
at 
org.apache.commons.vfs2.impl.VFSClassLoader.findClass(VFSClassLoader.java:180)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
- locked <0xf5af3b88> (a 
org.apache.commons.vfs2.impl.VFSClassLoader)
at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
- locked <0xf6f5c2f8> (a 
org.apache.commons.vfs2.impl.VFSClassLoader)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at 
java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:370)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at org.apache.accumulo.start.Main.checkDuplicates(Main.java:196)
at org.apache.accumulo.start.Main.getExecutables(Main.java:188)
at org.apache.accumulo.start.Main.main(Main.java:52)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3923) bootstrap_hdfs.sh does not copy correct jars to hdfs

2016-06-13 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15328621#comment-15328621
 ] 

Dave Marion commented on ACCUMULO-3923:
---

I will cherry-pick this back to 1.7.2 and test it with ACCUMULO-4341 tomorrow.

>  bootstrap_hdfs.sh does not copy correct jars to hdfs
> -
>
> Key: ACCUMULO-3923
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3923
> Project: Accumulo
>  Issue Type: Bug
>Reporter: Josh Elser
>Assignee: Dave Marion
>Priority: Critical
> Fix For: 1.8.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Trying to make the VFS classloading stuff work and it doesn't seem like 
> ServiceLoader is finding any of the KeywordExecutable implementations.
> Best I can tell after looking into this, VFSClassLoader (created by 
> AccumuloVFSClassLoader) has all of the jars listed as resources, but when 
> ServiceLoader tries to find the META-INF/services definitions, it returns 
> nothing, and thus we think the keyword must be a class name. Seems like a 
> commons-vfs bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3923) bootstrap_hdfs.sh does not copy correct jars to hdfs

2016-06-13 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15328348#comment-15328348
 ] 

Dave Marion commented on ACCUMULO-3923:
---

When did we move to slf4j?

>  bootstrap_hdfs.sh does not copy correct jars to hdfs
> -
>
> Key: ACCUMULO-3923
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3923
> Project: Accumulo
>  Issue Type: Bug
>Reporter: Josh Elser
>Assignee: Dave Marion
>Priority: Critical
> Fix For: 1.8.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Trying to make the VFS classloading stuff work and it doesn't seem like 
> ServiceLoader is finding any of the KeywordExecutable implementations.
> Best I can tell after looking into this, VFSClassLoader (created by 
> AccumuloVFSClassLoader) has all of the jars listed as resources, but when 
> ServiceLoader tries to find the META-INF/services definitions, it returns 
> nothing, and thus we think the keyword must be a class name. Seems like a 
> commons-vfs bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4341) ServiceLoader deadlock with classes loaded from HDFS

2016-06-13 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15327966#comment-15327966
 ] 

Dave Marion commented on ACCUMULO-4341:
---

I believe the VFS code, without the ServiceLoader, will work. It works in 1.6.

> ServiceLoader deadlock with classes loaded from HDFS
> 
>
> Key: ACCUMULO-4341
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4341
> Project: Accumulo
>  Issue Type: Bug
>  Components: client
>Affects Versions: 1.8.0
>Reporter: Dave Marion
>Priority: Blocker
> Fix For: 1.7.2, 1.8.0
>
>
> With Accumulo set up to use general.vfs.classpaths to load classes from HDFS, 
> running `accumulo help` will hang. 
> A jstack of the process shows the IPC Client thread at:
> {noformat}
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:348)
>   at 
> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2051)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:91)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>   at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1086)
>   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:966)
> {noformat}
> and the main thread at:
> {noformat}
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:502)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1454)
>   - locked <0xf09a2898> (a org.apache.hadoop.ipc.Client$Call)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1399)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
>   at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1982)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1128)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1124)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1124)
>   at 
> org.apache.commons.vfs2.provider.hdfs.HdfsFileObject.doAttach(HdfsFileObject.java:85)
>   at 
> org.apache.commons.vfs2.provider.AbstractFileObject.attach(AbstractFileObject.java:173)
>   - locked <0xf57fd008> (a 
> org.apache.commons.vfs2.provider.hdfs.HdfsFileSystem)
>   at 
> org.apache.commons.vfs2.provider.AbstractFileObject.getContent(AbstractFileObject.java:1236)
>   - locked <0xf57fd008> (a 
> org.apache.commons.vfs2.provider.hdfs.HdfsFileSystem)
>   at 
> org.apache.commons.vfs2.impl.VFSClassLoader.getPermissions(VFSClassLoader.java:300)
>   at 
> java.security.SecureClassLoader.getProtectionDomain(SecureClassLoader.java:206)
>   - locked <0xf5ad9138> (a java.util.HashMap)
>   at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>   at 
> org.apache.commons.vfs2.impl.VFSClassLoader.defineClass(VFSClassLoader.java:226)
>   at 
> org.apache.commons.vfs2.impl.VFSClassLoader.findClass(VFSClassLoader.java:180)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   - locked <0xf5af3b88> (a 
> org.apache.commons.vfs2.impl.VFSClassLoader)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
>   - locked <0xf6f5c2f8> (a 
> org.apache.commons.vfs2.impl.VFSClassLoader)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:348)
>   at 
> java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:370)
>   at 

[jira] [Commented] (ACCUMULO-3923) bootstrap_hdfs.sh does not copy correct jars to hdfs

2016-06-13 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15327965#comment-15327965
 ] 

Dave Marion commented on ACCUMULO-3923:
---

bq. Any reason you didn't commit your change to 1.7 as well, Dave Marion? (1.6 
too even?)

 Oversight on my part. It can be cherry-picked back to any release that uses 
slf4j.

>  bootstrap_hdfs.sh does not copy correct jars to hdfs
> -
>
> Key: ACCUMULO-3923
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3923
> Project: Accumulo
>  Issue Type: Bug
>Reporter: Josh Elser
>Assignee: Dave Marion
>Priority: Critical
> Fix For: 1.8.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Trying to make the VFS classloading stuff work and it doesn't seem like 
> ServiceLoader is finding any of the KeywordExecutable implementations.
> Best I can tell after looking into this, VFSClassLoader (created by 
> AccumuloVFSClassLoader) has all of the jars listed as resources, but when 
> ServiceLoader tries to find the META-INF/services definitions, it returns 
> nothing, and thus we think the keyword must be a class name. Seems like a 
> commons-vfs bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4328) Run multiple tablet servers on a single host

2016-06-06 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317492#comment-15317492
 ] 

Dave Marion commented on ACCUMULO-4328:
---

PR against 1.8 has been opened.

> Run multiple tablet servers on a single host
> 
>
> Key: ACCUMULO-4328
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4328
> Project: Accumulo
>  Issue Type: Improvement
>  Components: scripts, tserver
>Reporter: Dave Marion
>Assignee: Dave Marion
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Modify scripts and necessary code to run multiple tablet servers on a single 
> host.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4331) Make port configuration and allocation consistent across services

2016-06-07 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15319037#comment-15319037
 ] 

Dave Marion commented on ACCUMULO-4331:
---

I don't know that we can do what you are suggesting in the manner in which you 
are suggesting it. You would have to move the port allocation code out of Java 
and into the shell scripts. Another approach would be to push the port into the 
Log4J MDC and then add the MDC key to the logging pattern. I had taken this 
approach, but using the PID, in the PR against 1.6.x

> Make port configuration and allocation consistent across services
> -
>
> Key: ACCUMULO-4331
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4331
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Dave Marion
> Fix For: 1.8.0
>
>
> There was some discussion in ACCUMULO-4328 about ports, so I decided to track 
> down how the client ports are configured and allocated. Issues raised in the 
> discussion were:
>  1. The port search feature was not well understood
>  2. Ephemeral port allocation makes it hard to lock servers down (e.g. 
> iptables)
> Looking through the code I found the following properties allocate a port 
> number based on conf.getPort(). This returns the port number based on the 
> property and supports either a single value or zero. Then, in the server 
> component (monitor, tracer, gc, etc) this value is used when creating a 
> ServerSocket. If the port is already in use, the process will fail.
> {noformat}
> monitor.port.log4j
> trace.port.client
> gc.port.client
> monitor.port.client
> {noformat}
> The following properties use TServerUtils.startServer which uses the value in 
> the property to start the TServer. If the value is zero, then it picks a 
> random port between 1024 and 65535. If tserver.port.search is enabled, then 
> it will try a thousand times to bind to a random port.
> {noformat}
> tserver.port.client
> master.port.client
> master.replication.coordinator.port
> replication.receipt.service.port
> {noformat}
> I'm proposing that we deprecate the tserver.port.search property and the 
> value zero in the property value for the properties above. Instead, I think 
> we should allow the user to specify a single value or a range (M-N). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4331) Make port configuration and allocation consistent across services

2016-06-07 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15319043#comment-15319043
 ] 

Dave Marion commented on ACCUMULO-4331:
---

If we set tserver.port.client to "9997,9998," or "9997-" then any one 
of the three tservers in your example would allocate any of the three ports in 
the range specified. Do we want to make the tablet server code aware that there 
could be multiple instances on a host, that it is the Nth instance, and it must 
allocate the Nth port in the range?

> Make port configuration and allocation consistent across services
> -
>
> Key: ACCUMULO-4331
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4331
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Dave Marion
> Fix For: 1.8.0
>
>
> There was some discussion in ACCUMULO-4328 about ports, so I decided to track 
> down how the client ports are configured and allocated. Issues raised in the 
> discussion were:
>  1. The port search feature was not well understood
>  2. Ephemeral port allocation makes it hard to lock servers down (e.g. 
> iptables)
> Looking through the code I found the following properties allocate a port 
> number based on conf.getPort(). This returns the port number based on the 
> property and supports either a single value or zero. Then, in the server 
> component (monitor, tracer, gc, etc) this value is used when creating a 
> ServerSocket. If the port is already in use, the process will fail.
> {noformat}
> monitor.port.log4j
> trace.port.client
> gc.port.client
> monitor.port.client
> {noformat}
> The following properties use TServerUtils.startServer which uses the value in 
> the property to start the TServer. If the value is zero, then it picks a 
> random port between 1024 and 65535. If tserver.port.search is enabled, then 
> it will try a thousand times to bind to a random port.
> {noformat}
> tserver.port.client
> master.port.client
> master.replication.coordinator.port
> replication.receipt.service.port
> {noformat}
> I'm proposing that we deprecate the tserver.port.search property and the 
> value zero in the property value for the properties above. Instead, I think 
> we should allow the user to specify a single value or a range (M-N). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4331) Make port configuration and allocation consistent across services

2016-06-07 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15319030#comment-15319030
 ] 

Dave Marion commented on ACCUMULO-4331:
---

And note that 0 is totally random, not a 1 up attempt until it succeeds

> Make port configuration and allocation consistent across services
> -
>
> Key: ACCUMULO-4331
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4331
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Dave Marion
> Fix For: 1.8.0
>
>
> There was some discussion in ACCUMULO-4328 about ports, so I decided to track 
> down how the client ports are configured and allocated. Issues raised in the 
> discussion were:
>  1. The port search feature was not well understood
>  2. Ephemeral port allocation makes it hard to lock servers down (e.g. 
> iptables)
> Looking through the code I found the following properties allocate a port 
> number based on conf.getPort(). This returns the port number based on the 
> property and supports either a single value or zero. Then, in the server 
> component (monitor, tracer, gc, etc) this value is used when creating a 
> ServerSocket. If the port is already in use, the process will fail.
> {noformat}
> monitor.port.log4j
> trace.port.client
> gc.port.client
> monitor.port.client
> {noformat}
> The following properties use TServerUtils.startServer which uses the value in 
> the property to start the TServer. If the value is zero, then it picks a 
> random port between 1024 and 65535. If tserver.port.search is enabled, then 
> it will try a thousand times to bind to a random port.
> {noformat}
> tserver.port.client
> master.port.client
> master.replication.coordinator.port
> replication.receipt.service.port
> {noformat}
> I'm proposing that we deprecate the tserver.port.search property and the 
> value zero in the property value for the properties above. Instead, I think 
> we should allow the user to specify a single value or a range (M-N). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3923) VFS ClassLoader doesnt' work with KeywordExecutable

2016-05-31 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308811#comment-15308811
 ] 

Dave Marion commented on ACCUMULO-3923:
---

FWIW, bootstrap_hdfs.sh should help you get the environment set up correctly.

> VFS ClassLoader doesnt' work with KeywordExecutable
> ---
>
> Key: ACCUMULO-3923
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3923
> Project: Accumulo
>  Issue Type: Bug
>Reporter: Josh Elser
>Priority: Critical
> Fix For: 1.7.2, 1.8.0
>
>
> Trying to make the VFS classloading stuff work and it doesn't seem like 
> ServiceLoader is finding any of the KeywordExecutable implementations.
> Best I can tell after looking into this, VFSClassLoader (created by 
> AccumuloVFSClassLoader) has all of the jars listed as resources, but when 
> ServiceLoader tries to find the META-INF/services definitions, it returns 
> nothing, and thus we think the keyword must be a class name. Seems like a 
> commons-vfs bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4350) ShellServerIT fails to load RowSampler class

2016-06-21 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341629#comment-15341629
 ] 

Dave Marion commented on ACCUMULO-4350:
---

Will do.

> ShellServerIT fails to load RowSampler class
> 
>
> Key: ACCUMULO-4350
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4350
> Project: Accumulo
>  Issue Type: Bug
>  Components: test
>Reporter: Josh Elser
>Priority: Critical
> Fix For: 1.9.0
>
>
> Running {{JAVA_HOME=$(/usr/libexec/java_home -v1.8) mvn clean verify 
> -Dit.test=ShellServerIT -Dtest=foobar -DfailIfNoTests=false}}:
> I can see the following which is causing ShellServerIT to never exit:
> {noformat}
> 2016-06-21 00:48:23,675 [impl.ThriftTransportPool] WARN : Thread 
> "Time-limited test" stuck on IO to host.domain:61175 (0) for at least 120324 
> ms
> {noformat}
> Looking in the TabletServer logs for MAC
> {noformat}
> 2016-06-21 00:48:50,019 [tablet.Compactor] ERROR: 
> java.lang.ClassNotFoundException: org.apache.accumulo.core.sample.RowSampler
> java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> org.apache.accumulo.core.sample.RowSampler
>   at 
> org.apache.accumulo.core.sample.impl.SamplerFactory.newSampler(SamplerFactory.java:45)
>   at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:91)
>   at 
> org.apache.accumulo.core.file.DispatchingFileFactory.openWriter(DispatchingFileFactory.java:74)
>   at 
> org.apache.accumulo.core.file.FileOperations$OpenWriterOperation.build(FileOperations.java:331)
>   at org.apache.accumulo.tserver.tablet.Compactor.call(Compactor.java:201)
>   at 
> org.apache.accumulo.tserver.tablet.Tablet._majorCompact(Tablet.java:1850)
>   at 
> org.apache.accumulo.tserver.tablet.Tablet.majorCompact(Tablet.java:1967)
>   at 
> org.apache.accumulo.tserver.tablet.CompactionRunner.run(CompactionRunner.java:44)
>   at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at 
> org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.accumulo.core.sample.RowSampler
>   at 
> org.apache.commons.vfs2.impl.VFSClassLoader.findClass(VFSClassLoader.java:178)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at 
> org.apache.accumulo.start.classloader.vfs.AccumuloVFSClassLoader.loadClass(AccumuloVFSClassLoader.java:110)
>   at 
> org.apache.accumulo.core.sample.impl.SamplerFactory.newSampler(SamplerFactory.java:36)
>   ... 12 more
> 2016-06-21 00:48:50,019 [tablet.Tablet] ERROR: MajC Unexpected exception, 
> extent = h<<
> java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> org.apache.accumulo.core.sample.RowSampler
>   at 
> org.apache.accumulo.core.sample.impl.SamplerFactory.newSampler(SamplerFactory.java:45)
>   at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:91)
>   at 
> org.apache.accumulo.core.file.DispatchingFileFactory.openWriter(DispatchingFileFactory.java:74)
>   at 
> org.apache.accumulo.core.file.FileOperations$OpenWriterOperation.build(FileOperations.java:331)
>   at org.apache.accumulo.tserver.tablet.Compactor.call(Compactor.java:201)
>   at 
> org.apache.accumulo.tserver.tablet.Tablet._majorCompact(Tablet.java:1850)
>   at 
> org.apache.accumulo.tserver.tablet.Tablet.majorCompact(Tablet.java:1967)
>   at 
> org.apache.accumulo.tserver.tablet.CompactionRunner.run(CompactionRunner.java:44)
>   at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at 
> org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.accumulo.core.sample.RowSampler
>   at 
> org.apache.commons.vfs2.impl.VFSClassLoader.findClass(VFSClassLoader.java:178)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at 
> org.apache.accumulo.start.classloader.vfs.AccumuloVFSClassLoader.loadClass(AccumuloVFSClassLoader.java:110)
>   at 
> org.apache.accumulo.core.sample.impl.SamplerFactory.newSampler(SamplerFactory.java:36)
>  

[jira] [Commented] (ACCUMULO-4350) ShellServerIT fails to load RowSampler class

2016-06-21 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341674#comment-15341674
 ] 

Dave Marion commented on ACCUMULO-4350:
---

I'm running it right now with Java 7, seeing tons of the same stuck on IO 
warnings.

> ShellServerIT fails to load RowSampler class
> 
>
> Key: ACCUMULO-4350
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4350
> Project: Accumulo
>  Issue Type: Bug
>  Components: test
>Reporter: Josh Elser
>Priority: Critical
> Fix For: 1.9.0
>
>
> Running {{JAVA_HOME=$(/usr/libexec/java_home -v1.8) mvn clean verify 
> -Dit.test=ShellServerIT -Dtest=foobar -DfailIfNoTests=false}}:
> I can see the following which is causing ShellServerIT to never exit:
> {noformat}
> 2016-06-21 00:48:23,675 [impl.ThriftTransportPool] WARN : Thread 
> "Time-limited test" stuck on IO to host.domain:61175 (0) for at least 120324 
> ms
> {noformat}
> Looking in the TabletServer logs for MAC
> {noformat}
> 2016-06-21 00:48:50,019 [tablet.Compactor] ERROR: 
> java.lang.ClassNotFoundException: org.apache.accumulo.core.sample.RowSampler
> java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> org.apache.accumulo.core.sample.RowSampler
>   at 
> org.apache.accumulo.core.sample.impl.SamplerFactory.newSampler(SamplerFactory.java:45)
>   at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:91)
>   at 
> org.apache.accumulo.core.file.DispatchingFileFactory.openWriter(DispatchingFileFactory.java:74)
>   at 
> org.apache.accumulo.core.file.FileOperations$OpenWriterOperation.build(FileOperations.java:331)
>   at org.apache.accumulo.tserver.tablet.Compactor.call(Compactor.java:201)
>   at 
> org.apache.accumulo.tserver.tablet.Tablet._majorCompact(Tablet.java:1850)
>   at 
> org.apache.accumulo.tserver.tablet.Tablet.majorCompact(Tablet.java:1967)
>   at 
> org.apache.accumulo.tserver.tablet.CompactionRunner.run(CompactionRunner.java:44)
>   at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at 
> org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.accumulo.core.sample.RowSampler
>   at 
> org.apache.commons.vfs2.impl.VFSClassLoader.findClass(VFSClassLoader.java:178)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at 
> org.apache.accumulo.start.classloader.vfs.AccumuloVFSClassLoader.loadClass(AccumuloVFSClassLoader.java:110)
>   at 
> org.apache.accumulo.core.sample.impl.SamplerFactory.newSampler(SamplerFactory.java:36)
>   ... 12 more
> 2016-06-21 00:48:50,019 [tablet.Tablet] ERROR: MajC Unexpected exception, 
> extent = h<<
> java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> org.apache.accumulo.core.sample.RowSampler
>   at 
> org.apache.accumulo.core.sample.impl.SamplerFactory.newSampler(SamplerFactory.java:45)
>   at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:91)
>   at 
> org.apache.accumulo.core.file.DispatchingFileFactory.openWriter(DispatchingFileFactory.java:74)
>   at 
> org.apache.accumulo.core.file.FileOperations$OpenWriterOperation.build(FileOperations.java:331)
>   at org.apache.accumulo.tserver.tablet.Compactor.call(Compactor.java:201)
>   at 
> org.apache.accumulo.tserver.tablet.Tablet._majorCompact(Tablet.java:1850)
>   at 
> org.apache.accumulo.tserver.tablet.Tablet.majorCompact(Tablet.java:1967)
>   at 
> org.apache.accumulo.tserver.tablet.CompactionRunner.run(CompactionRunner.java:44)
>   at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at 
> org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.accumulo.core.sample.RowSampler
>   at 
> org.apache.commons.vfs2.impl.VFSClassLoader.findClass(VFSClassLoader.java:178)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at 
> org.apache.accumulo.start.classloader.vfs.AccumuloVFSClassLoader.loadClass(AccumuloVFSClassLoader.java:110)
>   at 
> 

[jira] [Commented] (ACCUMULO-4350) ShellServerIT fails to load RowSampler class

2016-06-21 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341723#comment-15341723
 ] 

Dave Marion commented on ACCUMULO-4350:
---

Results are below. I saw the same error you did in the logs, but I don't think 
its related. I saw a ton of stuck on IO warnings at two minutes, and these fail 
after 60 seconds:

{noformat}
Tests in error: 
  ShellServerIT.deleteTables:291 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.deleteTables:291 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.deleteTables:291 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.deleteTables:291 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.deleteTables:291 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.deleteTables:291 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.deleteTables:291 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.createTableWithProperties:836 » TestTimedOut test timed out 
afte...
  ShellServerIT.deleteTables:291 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.deleteTables:291 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.deleterows:1142 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.deleteTables:291 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.deleteTables:291 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.egrep:420 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.deleteTables:291 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.exporttableImporttable:358 » TestTimedOut test timed out after 
6...
  ShellServerIT.formatter:1198 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.deleteTables:291 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.deleteTables:291 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.deleteTables:291 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.deleteTables:291 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.deleteTables:291 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.deleteTables:291 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.iter:535 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.deleteTables:291 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.deleteTables:291 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.maxrow:1417 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.merge:1429 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.namespaces:1657 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.deleteTables:291 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.deleteTables:291 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.deleteTables:291 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.deleteTables:291 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.deleteTables:291 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.deleteTables:291 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.deleteTables:291 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.deleteTables:291 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.deleteTables:291 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.deleteTables:291 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.testCompactionSelection:990 » TestTimedOut test timed out after 
...
  ShellServerIT.deleteTables:291 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.testCompactions:873 » TestTimedOut test timed out after 60 
secon...
  ShellServerIT.deleteTables:291 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.testScanScample:1033 » TestTimedOut test timed out after 60 
seco...
  ShellServerIT.deleteTables:291 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.user:484 » TestTimedOut test timed out after 60 seconds
  ShellServerIT.deleteTables:291 » TestTimedOut test timed out after 60 seconds

Tests run: 49, Failures: 0, Errors: 47, Skipped: 0
{noformat}

I think the ShellServerIT.testScanScample test is failing because the 
accumulo-core.jar file is not copied into 
target/mini-tests/org.apache.accumulo.harness.SharedMiniClusterBase_*/lib/ext 
directory.

> ShellServerIT fails to load RowSampler class
> 
>
> Key: ACCUMULO-4350
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4350
> Project: Accumulo
>  Issue Type: Bug
>  Components: test
>Reporter: Josh Elser
>Priority: Critical
> Fix For: 1.9.0
>
>
> Running {{JAVA_HOME=$(/usr/libexec/java_home -v1.8) mvn clean verify 
> -Dit.test=ShellServerIT -Dtest=foobar -DfailIfNoTests=false}}:
> 

[jira] [Commented] (ACCUMULO-4329) Replication services don't have property to enable port-search

2016-06-20 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15339470#comment-15339470
 ] 

Dave Marion commented on ACCUMULO-4329:
---

It's an issue in that the feature is not supported, but not critical as the 
workaround is to set replication.receipt.service.port to zero. Having said 
that, I don't know if the intent is to support the feature for this port. I 
would say close it for now, this is documentation enough for someone that runs 
into it.

> Replication services don't have property to enable port-search
> --
>
> Key: ACCUMULO-4329
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4329
> Project: Accumulo
>  Issue Type: Bug
>  Components: tserver
>Reporter: Dave Marion
> Fix For: 1.8.0
>
>
> TabletServer will not start if it cannot reserve the configured replication 
> port. The code passes `null` for enabling the port search feature, which 
> resolves to `false`, and the TServer fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3923) bootstrap_hdfs.sh does not copy correct jars to hdfs

2016-06-20 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340873#comment-15340873
 ] 

Dave Marion commented on ACCUMULO-3923:
---

Yes, with ACCUMULO-4341

>  bootstrap_hdfs.sh does not copy correct jars to hdfs
> -
>
> Key: ACCUMULO-3923
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3923
> Project: Accumulo
>  Issue Type: Bug
>Reporter: Josh Elser
>Assignee: Dave Marion
>Priority: Critical
> Fix For: 1.7.2, 1.8.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Trying to make the VFS classloading stuff work and it doesn't seem like 
> ServiceLoader is finding any of the KeywordExecutable implementations.
> Best I can tell after looking into this, VFSClassLoader (created by 
> AccumuloVFSClassLoader) has all of the jars listed as resources, but when 
> ServiceLoader tries to find the META-INF/services definitions, it returns 
> nothing, and thus we think the keyword must be a class name. Seems like a 
> commons-vfs bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4353) Stabilize tablet assignment during transient failure

2016-06-23 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15347329#comment-15347329
 ] 

Dave Marion commented on ACCUMULO-4353:
---

bq. Can you expand on this some more? Given that assignment is arguably the 
most important thing for the Master to do, why are we concerned about letting 
the master do that as fast as it can (for the aforementioned reason)? Do we 
need to come up with a more efficient way for the master to handle the 
reassignment of many tablets?

Reading through this, and bringing some first-hand experience, I don't think 
the issue is the the Master assigning tablets. It's the issue of tablet servers 
that are down for a short period of time. When a tserver goes down, the Master 
re-assigns the tablets. When the tserver comes back up, it goes through several 
rounds of balancing which could take a long time and cause a lot of churn.

bq. I'm a little worried about this as a configuration knob – I feel like it 
kind of goes against the highly-available distributed database which we expect 
Accumulo to be. When we don't reassign tablets fast, that is a direct lack of 
availability for clients to read data.

I don't see any harm done here as long as the default behavior is what happens 
today. Allowing an administrator to choose to delay tablet reassignment may not 
fit most use cases, but it could fit some.

My 2 cents.

> Stabilize tablet assignment during transient failure
> 
>
> Key: ACCUMULO-4353
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4353
> Project: Accumulo
>  Issue Type: Improvement
>Reporter: Shawn Walker
>Assignee: Shawn Walker
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When a tablet server dies, Accumulo attempts to reassign the tablets it was 
> hosting as quickly as possible to maintain availability.  If multiple tablet 
> servers die in quick succession, such as from a rolling restart of the 
> Accumulo cluster or a network partition, this behavior can cause a storm of 
> reassignment and rebalancing, placing significant load on the master.
> To avert such load, Accumulo should be capable of maintaining a steady tablet 
> assignment state in the face of transient tablet server loss.  Instead of 
> reassigning tablets as quickly as possible, Accumulo should be await the 
> return of a temporarily downed tablet server (for some configurable duration) 
> before assigning its tablets to other tablet servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4353) Stabilize tablet assignment during transient failure

2016-06-23 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15347356#comment-15347356
 ] 

Dave Marion commented on ACCUMULO-4353:
---

bq. there's always a concern of technical debt (in terms of complexity)

Fair enough.

bq. If this is really about trying to make rolling-restarts better.

Not sure, I think it has to do with quick unplanned restarts, but it would be 
good to clear that up. I see a rolling restart being an intentional, planned 
activity. I think (based on "failure" in the title) this is for the 
unintentional, unplanned short duration outage (e.g. losing connectivity to a 
rack for a short time) where the administrator wants to bring the failed tablet 
servers up as soon as possible. 

> Stabilize tablet assignment during transient failure
> 
>
> Key: ACCUMULO-4353
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4353
> Project: Accumulo
>  Issue Type: Improvement
>Reporter: Shawn Walker
>Assignee: Shawn Walker
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When a tablet server dies, Accumulo attempts to reassign the tablets it was 
> hosting as quickly as possible to maintain availability.  If multiple tablet 
> servers die in quick succession, such as from a rolling restart of the 
> Accumulo cluster or a network partition, this behavior can cause a storm of 
> reassignment and rebalancing, placing significant load on the master.
> To avert such load, Accumulo should be capable of maintaining a steady tablet 
> assignment state in the face of transient tablet server loss.  Instead of 
> reassigning tablets as quickly as possible, Accumulo should be await the 
> return of a temporarily downed tablet server (for some configurable duration) 
> before assigning its tablets to other tablet servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4342) Admin#stopTabletServer(ClientContext, List, boolean) doesn't work with dynamic ports (0)

2016-06-23 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15347493#comment-15347493
 ] 

Dave Marion commented on ACCUMULO-4342:
---

If I remember correctly, this has been around for a while and not associated 
with my recent changes.

> Admin#stopTabletServer(ClientContext, List, boolean) doesn't work 
> with dynamic ports (0)
> 
>
> Key: ACCUMULO-4342
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4342
> Project: Accumulo
>  Issue Type: Bug
>Reporter: Josh Elser
> Fix For: 1.6.6, 1.7.3, 1.8.1
>
>
> Noticed in Dave's changeset from ACCUMULO-4331 that the logic to stop the 
> tabletservers when invoking `admin stop`won't work when the ports are set to 
> '0' (bind a free port in the ephemeral range).
> Looks like we'd have to do a few things to make this work properly:
> 1. If the tserver client port is '0' and no port is provided in the hostname 
> to `admin stop`, we should look at ZK to stop all tservers on that host.
> 2. If the tserver client port is '0' and a port is provided in the hostname 
> to `admin stop`, we should try to just stop the tserver with the given port 
> on that host.
> Would have to look more closely at the code to verify this all makes sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations

2016-02-24 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163437#comment-15163437
 ] 

Dave Marion edited comment on ACCUMULO-1755 at 2/24/16 6:07 PM:


We could solve this by:

1. Making MutationSet.mutations a ConcurrentHashMap
2. Making MutationSet.memoryUsed an AtomicLong
3. Not synchronizing on access to TabletServerBatchWriter.mutations
4. Changing TabletServerBatchWriter.mutations to an AtomicReference so that it 
is safe to swap it out in startProcessing()
5. In startProcessing(), swap in a new MutationSet then add the mutations from 
the previous MutationSet to the writer.


was (Author: dlmarion):
We could solve this by:

1. Making MutationSet.mutations a ConcurrentHashMap
2. Not synchronizing on access to TabletServerBatchWriter.mutations
3. Changing TabletServerBatchWriter.mutations to an AtomicReference so that it 
is safe to swap it out in startProcessing()
4. In startProcessing(), swap in a new MutationSet then add the mutations from 
the previous MutationSet to the writer.

> BatchWriter blocks all addMutation calls while binning mutations
> 
>
> Key: ACCUMULO-1755
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1755
> Project: Accumulo
>  Issue Type: Improvement
>  Components: client
>Reporter: Adam Fuchs
>
> Through code inspection, we found that the BatchWriter bins mutations inside 
> of a synchronized block that covers calls to addMutation. Binning potentially 
> involves lookups of tablet metadata and processes a fair amount of 
> information. We will get better parallelism if we can either unlock the lock 
> while binning, dedicate another thread to do the binning, or use one of the 
> send threads to do the binning.
> This has not been verified empirically yet, so there is not yet any profiling 
> info to indicate the level of improvement that we should expect. Profiling 
> and repeatable demonstration of this performance bottleneck should be the 
> first step on this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations

2016-02-24 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163460#comment-15163460
 ] 

Dave Marion commented on ACCUMULO-1755:
---

Maybe something for 2.0? I wasn't looking to do an entire rewrite, just remove 
some of the locking.

> BatchWriter blocks all addMutation calls while binning mutations
> 
>
> Key: ACCUMULO-1755
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1755
> Project: Accumulo
>  Issue Type: Improvement
>  Components: client
>Reporter: Adam Fuchs
>
> Through code inspection, we found that the BatchWriter bins mutations inside 
> of a synchronized block that covers calls to addMutation. Binning potentially 
> involves lookups of tablet metadata and processes a fair amount of 
> information. We will get better parallelism if we can either unlock the lock 
> while binning, dedicate another thread to do the binning, or use one of the 
> send threads to do the binning.
> This has not been verified empirically yet, so there is not yet any profiling 
> info to indicate the level of improvement that we should expect. Profiling 
> and repeatable demonstration of this performance bottleneck should be the 
> first step on this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations

2016-02-24 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163437#comment-15163437
 ] 

Dave Marion commented on ACCUMULO-1755:
---

We could solve this by:

1. Making MutationSet.mutations a ConcurrentHashMap
2. Not synchronizing on access to TabletServerBatchWriter.mutations
3. Changing TabletServerBatchWriter.mutations to an AtomicReference so that it 
is safe to swap it out in startProcessing()
4. In startProcessing(), swap in a new MutationSet then add the mutations from 
the previous MutationSet to the writer.

> BatchWriter blocks all addMutation calls while binning mutations
> 
>
> Key: ACCUMULO-1755
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1755
> Project: Accumulo
>  Issue Type: Improvement
>  Components: client
>Reporter: Adam Fuchs
>
> Through code inspection, we found that the BatchWriter bins mutations inside 
> of a synchronized block that covers calls to addMutation. Binning potentially 
> involves lookups of tablet metadata and processes a fair amount of 
> information. We will get better parallelism if we can either unlock the lock 
> while binning, dedicate another thread to do the binning, or use one of the 
> send threads to do the binning.
> This has not been verified empirically yet, so there is not yet any profiling 
> info to indicate the level of improvement that we should expect. Profiling 
> and repeatable demonstration of this performance bottleneck should be the 
> first step on this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations

2016-02-24 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163513#comment-15163513
 ] 

Dave Marion commented on ACCUMULO-1755:
---

https://reviews.apache.org/r/43957/


> BatchWriter blocks all addMutation calls while binning mutations
> 
>
> Key: ACCUMULO-1755
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1755
> Project: Accumulo
>  Issue Type: Improvement
>  Components: client
>Reporter: Adam Fuchs
>Assignee: Dave Marion
>
> Through code inspection, we found that the BatchWriter bins mutations inside 
> of a synchronized block that covers calls to addMutation. Binning potentially 
> involves lookups of tablet metadata and processes a fair amount of 
> information. We will get better parallelism if we can either unlock the lock 
> while binning, dedicate another thread to do the binning, or use one of the 
> send threads to do the binning.
> This has not been verified empirically yet, so there is not yet any profiling 
> info to indicate the level of improvement that we should expect. Profiling 
> and repeatable demonstration of this performance bottleneck should be the 
> first step on this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations

2016-02-24 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163587#comment-15163587
 ] 

Dave Marion commented on ACCUMULO-1755:
---

Adam, I have this issue now where i have N clients sharing a batch writer. As 
you noted in the description, all the threads wait on binning mutations. I 
could use a batch writer per thread, and that may be the solution in the end. I 
think I can remove the synchronized modifier from addMutation, but I think in 
the end I may just be pushing the problem to an area of the code that the 
client has no control over. I'm interested in solving this issue though, any 
time you can spare would be appreciated.

> BatchWriter blocks all addMutation calls while binning mutations
> 
>
> Key: ACCUMULO-1755
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1755
> Project: Accumulo
>  Issue Type: Improvement
>  Components: client
>Reporter: Adam Fuchs
>Assignee: Dave Marion
>
> Through code inspection, we found that the BatchWriter bins mutations inside 
> of a synchronized block that covers calls to addMutation. Binning potentially 
> involves lookups of tablet metadata and processes a fair amount of 
> information. We will get better parallelism if we can either unlock the lock 
> while binning, dedicate another thread to do the binning, or use one of the 
> send threads to do the binning.
> This has not been verified empirically yet, so there is not yet any profiling 
> info to indicate the level of improvement that we should expect. Profiling 
> and repeatable demonstration of this performance bottleneck should be the 
> first step on this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations

2016-02-24 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163732#comment-15163732
 ] 

Dave Marion commented on ACCUMULO-1755:
---

No performance numbers, just seeing BLOCKED threads in a stack trace :-). I'll 
see what I can do about getting some performance numbers with and without my 
final patch. Do you think continuous ingest would be a good framework for this?

> BatchWriter blocks all addMutation calls while binning mutations
> 
>
> Key: ACCUMULO-1755
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1755
> Project: Accumulo
>  Issue Type: Improvement
>  Components: client
>Reporter: Adam Fuchs
>Assignee: Dave Marion
>
> Through code inspection, we found that the BatchWriter bins mutations inside 
> of a synchronized block that covers calls to addMutation. Binning potentially 
> involves lookups of tablet metadata and processes a fair amount of 
> information. We will get better parallelism if we can either unlock the lock 
> while binning, dedicate another thread to do the binning, or use one of the 
> send threads to do the binning.
> This has not been verified empirically yet, so there is not yet any profiling 
> info to indicate the level of improvement that we should expect. Profiling 
> and repeatable demonstration of this performance bottleneck should be the 
> first step on this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4169) TabletServer.config contextCleaner removes contexts that are not set on a table

2016-03-29 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216311#comment-15216311
 ] 

Dave Marion commented on ACCUMULO-4169:
---

[~bfloss] [~ivan.bella]  I'd like to come to some consensus on the "in-use" 
issue. You have suggested removing contexts that don't have running scans that 
use them in an attempt to reduce the used perm gen space. [~ctubbsii] mentioned 
that perm gen space is not as big of an issue with Java 8 (and likely beyond). 
So, the question is, I think, is using  -XX:+CMSClassUnloadingEnabled in Java 7 
sufficient if we remove Contexts that are no longer defined in the 
configuration?

> TabletServer.config contextCleaner removes contexts that are not set on a 
> table
> ---
>
> Key: ACCUMULO-4169
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4169
> Project: Accumulo
>  Issue Type: Bug
>  Components: tserver
>Affects Versions: 1.8.0
>Reporter: Dave Marion
>
> ACCUMULO-3948 added a feature where you could define a context in the 
> Accumulo configuration, not set it on a table, and use it in a Scanner. 
> However, there is a runnable created n TabletServer.config() that runs every 
> 60 seconds that closes context that are not defined on a table. Suggesting 
> that we have the context cleaner not close any context defined in the 
> configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4171) Update to htrace-core4

2016-04-07 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230632#comment-15230632
 ] 

Dave Marion commented on ACCUMULO-4171:
---

Oh ok. What is this ticket for then?

> Update to htrace-core4
> --
>
> Key: ACCUMULO-4171
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4171
> Project: Accumulo
>  Issue Type: Improvement
>  Components: trace
>Reporter: Mike Drob
> Fix For: 1.8.0
>
>
> We are currently using HTrace 3.1.0-incubating. There were some API changes 
> and improvements on the way to HTrace 4.x-incubating, and we should stay up 
> to date with those changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4171) Update to htrace-core4

2016-04-07 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230577#comment-15230577
 ] 

Dave Marion commented on ACCUMULO-4171:
---

Actually, let me clarify, since I am running Accumulo 1.7.1 on CDH 5.6.0 and it 
works at runtime. We have no versions of Accumulo that will *compile* against 
Hadoop 2.6+, correct?

> Update to htrace-core4
> --
>
> Key: ACCUMULO-4171
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4171
> Project: Accumulo
>  Issue Type: Improvement
>  Components: trace
>Reporter: Mike Drob
> Fix For: 1.8.0
>
>
> We are currently using HTrace 3.1.0-incubating. There were some API changes 
> and improvements on the way to HTrace 4.x-incubating, and we should stay up 
> to date with those changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4171) Update to htrace-core4

2016-04-07 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230572#comment-15230572
 ] 

Dave Marion commented on ACCUMULO-4171:
---

Am I to read this that no Accumulo versions will work with Hadoop 2.6 and 
Hadoop 2.7?

> Update to htrace-core4
> --
>
> Key: ACCUMULO-4171
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4171
> Project: Accumulo
>  Issue Type: Improvement
>  Components: trace
>Reporter: Mike Drob
> Fix For: 1.8.0
>
>
> We are currently using HTrace 3.1.0-incubating. There were some API changes 
> and improvements on the way to HTrace 4.x-incubating, and we should stay up 
> to date with those changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ACCUMULO-4173) Balance table within a set of hosts

2016-04-12 Thread Dave Marion (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Marion updated ACCUMULO-4173:
--
Fix Version/s: 1.7.2

> Balance table within a set of hosts
> ---
>
> Key: ACCUMULO-4173
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4173
> Project: Accumulo
>  Issue Type: Improvement
>  Components: master
>Reporter: Dave Marion
>Assignee: Dave Marion
>  Labels: balancer
> Fix For: 1.7.2, 1.8.0
>
> Attachments: ACCUMULO-4173-1.patch, ACCUMULO-4173-2.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Create a table balancer that will provide a set of hosts for the table tablet 
> balancer to use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ACCUMULO-4184) HostRegexTableLoadBalancer doesn't work

2016-04-11 Thread Dave Marion (JIRA)
Dave Marion created ACCUMULO-4184:
-

 Summary: HostRegexTableLoadBalancer doesn't work
 Key: ACCUMULO-4184
 URL: https://issues.apache.org/jira/browse/ACCUMULO-4184
 Project: Accumulo
  Issue Type: Bug
  Components: master
Affects Versions: 1.8.0
Reporter: Dave Marion
Assignee: Dave Marion


Ticket to address issues with the new balancer. Performing re-assignment in the 
getAssignments method for tablets that are out of bounds is not working as the 
TabletGroupWatcher prevents it. I believe that in this case the tablet needs to 
be unloaded from its current assigned tablet server so that it can be assigned 
in the next round



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ACCUMULO-4184) HostRegexTableLoadBalancer fixup

2016-04-12 Thread Dave Marion (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Marion updated ACCUMULO-4184:
--
Summary: HostRegexTableLoadBalancer fixup  (was: HostRegexTableLoadBalancer 
doesn't work)

> HostRegexTableLoadBalancer fixup
> 
>
> Key: ACCUMULO-4184
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4184
> Project: Accumulo
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.8.0
>Reporter: Dave Marion
>Assignee: Dave Marion
>
> Ticket to address issues with the new balancer. Performing re-assignment in 
> the getAssignments method for tablets that are out of bounds is not working 
> as the TabletGroupWatcher prevents it. I believe that in this case the tablet 
> needs to be unloaded from its current assigned tablet server so that it can 
> be assigned in the next round



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ACCUMULO-4184) HostRegexTableLoadBalancer fixup

2016-04-12 Thread Dave Marion (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Marion updated ACCUMULO-4184:
--
Description: Ticket to address issues with the new balancer.  (was: Ticket 
to address issues with the new balancer. Performing re-assignment in the 
getAssignments method for tablets that are out of bounds is not working as the 
TabletGroupWatcher prevents it. I believe that in this case the tablet needs to 
be unloaded from its current assigned tablet server so that it can be assigned 
in the next round)

> HostRegexTableLoadBalancer fixup
> 
>
> Key: ACCUMULO-4184
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4184
> Project: Accumulo
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.8.0
>Reporter: Dave Marion
>Assignee: Dave Marion
>
> Ticket to address issues with the new balancer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4184) HostRegexTableLoadBalancer fixup

2016-04-12 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237713#comment-15237713
 ] 

Dave Marion commented on ACCUMULO-4184:
---

Re-opened ticket and changed description to fix some minor issues found after 
commit of ACCUMULO-4173

> HostRegexTableLoadBalancer fixup
> 
>
> Key: ACCUMULO-4184
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4184
> Project: Accumulo
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.8.0
>Reporter: Dave Marion
>Assignee: Dave Marion
>
> Ticket to address issues with the new balancer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (ACCUMULO-4184) HostRegexTableLoadBalancer doesn't work

2016-04-12 Thread Dave Marion (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Marion resolved ACCUMULO-4184.
---
Resolution: Won't Fix

Just realized I did not close the related ticket and commit this feature. Will 
address in the related ticket.

> HostRegexTableLoadBalancer doesn't work
> ---
>
> Key: ACCUMULO-4184
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4184
> Project: Accumulo
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.8.0
>Reporter: Dave Marion
>Assignee: Dave Marion
>
> Ticket to address issues with the new balancer. Performing re-assignment in 
> the getAssignments method for tablets that are out of bounds is not working 
> as the TabletGroupWatcher prevents it. I believe that in this case the tablet 
> needs to be unloaded from its current assigned tablet server so that it can 
> be assigned in the next round



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4169) TabletServer.config contextCleaner removes contexts that are not set on a table

2016-03-23 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208870#comment-15208870
 ] 

Dave Marion commented on ACCUMULO-4169:
---

bq. Do we have an API now for creating the context? 

The context object is created under the covers and has no direct API. A user 
can define/undefine a context via the Accumulo configuration.

> TabletServer.config contextCleaner removes contexts that are not set on a 
> table
> ---
>
> Key: ACCUMULO-4169
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4169
> Project: Accumulo
>  Issue Type: Bug
>  Components: tserver
>Affects Versions: 1.8.0
>Reporter: Dave Marion
>
> ACCUMULO-3948 added a feature where you could define a context in the 
> Accumulo configuration, not set it on a table, and use it in a Scanner. 
> However, there is a runnable created n TabletServer.config() that runs every 
> 60 seconds that closes context that are not defined on a table. Suggesting 
> that we have the context cleaner not close any context defined in the 
> configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4169) TabletServer.config contextCleaner removes contexts that are not set on a table

2016-03-24 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210599#comment-15210599
 ] 

Dave Marion commented on ACCUMULO-4169:
---

bq.  I'm not sure what kind of protection we could provide to prevent clients 
in a multi-tenant environment from tanking the entire system.

An administrator still has to define the context in the configuration. 

Regarding PermGen, ACCUMULO-599 highlighted some fixes, and I think the right 
solution, in Java 7 anyway, is to use -XX:+CMSClassUnloadingEnabled.

> TabletServer.config contextCleaner removes contexts that are not set on a 
> table
> ---
>
> Key: ACCUMULO-4169
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4169
> Project: Accumulo
>  Issue Type: Bug
>  Components: tserver
>Affects Versions: 1.8.0
>Reporter: Dave Marion
>
> ACCUMULO-3948 added a feature where you could define a context in the 
> Accumulo configuration, not set it on a table, and use it in a Scanner. 
> However, there is a runnable created n TabletServer.config() that runs every 
> 60 seconds that closes context that are not defined on a table. Suggesting 
> that we have the context cleaner not close any context defined in the 
> configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ACCUMULO-4169) TabletServer.config contextCleaner removes contexts that are not set on a table

2016-03-23 Thread Dave Marion (JIRA)
Dave Marion created ACCUMULO-4169:
-

 Summary: TabletServer.config contextCleaner removes contexts that 
are not set on a table
 Key: ACCUMULO-4169
 URL: https://issues.apache.org/jira/browse/ACCUMULO-4169
 Project: Accumulo
  Issue Type: Bug
  Components: tserver
Affects Versions: 1.8.0
Reporter: Dave Marion






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations

2016-03-08 Thread Dave Marion (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Marion updated ACCUMULO-1755:
--
Attachment: 1755-perf-test.patch
1755-nosync-perf-test.patch

> BatchWriter blocks all addMutation calls while binning mutations
> 
>
> Key: ACCUMULO-1755
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1755
> Project: Accumulo
>  Issue Type: Improvement
>  Components: client
>Reporter: Adam Fuchs
>Assignee: Dave Marion
> Fix For: 1.6.6, 1.7.2, 1.8.0
>
> Attachments: 1755-nosync-perf-test.patch, 1755-perf-test.patch, 
> ACCUMULO-1755.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Through code inspection, we found that the BatchWriter bins mutations inside 
> of a synchronized block that covers calls to addMutation. Binning potentially 
> involves lookups of tablet metadata and processes a fair amount of 
> information. We will get better parallelism if we can either unlock the lock 
> while binning, dedicate another thread to do the binning, or use one of the 
> send threads to do the binning.
> This has not been verified empirically yet, so there is not yet any profiling 
> info to indicate the level of improvement that we should expect. Profiling 
> and repeatable demonstration of this performance bottleneck should be the 
> first step on this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations

2016-03-08 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184994#comment-15184994
 ] 

Dave Marion commented on ACCUMULO-1755:
---

I wrote a new test that sends 1M mutations total using N threads with a 
BatchWriter buffer of different sizes. The test is run twice and the time 
discarded to account for JVM startup. Then the test is run 10 times and the 
average (in seconds) is reported for total time and time to add mutations. 

First, I added some code to the TSBW to determine that with my test data I was 
sending the following number of batches using 1, 10, and 100MB buffers:

||BatchWriter Max Memory Size || Flushes To Accumulo
| 1MB   |  515 |
| 10MB  |  52  |
| 100MB |  6   |

Here are the results of the test:

h2. master branch

Using the patch 1755-perf-test.patch

Total Time

||Threads|| 1MB || 10MB || 100MB ||
| 1 | 3.121  | 2.818 | 3.158 |
| 2 | 3.102 | 2.414 | 2.950 |
| 4 | 3.367 | 2.573 | 3.114 |
| 8 | 3.422 | 2.569 | 3.140 |
| 16 | 3.590 | 2.741 | 3.332 |

Add Mutation Time

||Threads|| 1MB || 10MB || 100MB ||
| 1 | 3.114 | 2.733 | 2.498 |
| 2 | 3.088 | 2.350 | 2.371 |
| 4 | 3.360 | 2.506 | 2.472 |
| 8 | 3.414 | 2.516 | 2.509 |
| 16 | 3.582 | 2.692 | 2.696 |

h2. master branch with modifications to remove sync on addMutation()

I successfully modified the TSBW to remove the synchronization modifier from 
the addMutation method. The multi-threaded binning test passes so I have some 
confidence that the data is correct. Use patch 1755-nosync-perf-test.patch

Total Time

||Threads|| 1MB || 10MB || 100MB ||
| 1 | 3.080 | 2.766 | 3.255 |
| 2 | 2.972 | 2.420 | 3.137 |
| 4 | 3.162 | 2.492 | 3.190 |
| 8 | 3.100 | 2.658 | 3.623 |
| 16 | 3.393 | 2.898 | 3.743 |

Add Mutation Time

||Threads|| 1MB || 10MB || 100MB ||
| 1 | 3.072 | 2.653 | 2.517 |
| 2 | 2.965 | 2.371 | 2.527 |
| 4 | 3.155 | 2.441 | 2.589 |
| 8 | 3.092 | 2.602 | 2.961 |
| 16 | 3.385 | 2.839 | 2.891 |

 I think the results are inconclusive. The tests run with MAC on localhost, so 
this is likely a best case scenario. I'd be interested to see this re-run on a 
real cluster.


> BatchWriter blocks all addMutation calls while binning mutations
> 
>
> Key: ACCUMULO-1755
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1755
> Project: Accumulo
>  Issue Type: Improvement
>  Components: client
>Reporter: Adam Fuchs
>Assignee: Dave Marion
> Fix For: 1.6.6, 1.7.2, 1.8.0
>
> Attachments: ACCUMULO-1755.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Through code inspection, we found that the BatchWriter bins mutations inside 
> of a synchronized block that covers calls to addMutation. Binning potentially 
> involves lookups of tablet metadata and processes a fair amount of 
> information. We will get better parallelism if we can either unlock the lock 
> while binning, dedicate another thread to do the binning, or use one of the 
> send threads to do the binning.
> This has not been verified empirically yet, so there is not yet any profiling 
> info to indicate the level of improvement that we should expect. Profiling 
> and repeatable demonstration of this performance bottleneck should be the 
> first step on this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations

2016-03-02 Thread Dave Marion (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Marion updated ACCUMULO-1755:
--
Fix Version/s: 1.7.2
   1.6.6

> BatchWriter blocks all addMutation calls while binning mutations
> 
>
> Key: ACCUMULO-1755
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1755
> Project: Accumulo
>  Issue Type: Improvement
>  Components: client
>Reporter: Adam Fuchs
>Assignee: Dave Marion
> Fix For: 1.6.6, 1.7.2, 1.8.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Through code inspection, we found that the BatchWriter bins mutations inside 
> of a synchronized block that covers calls to addMutation. Binning potentially 
> involves lookups of tablet metadata and processes a fair amount of 
> information. We will get better parallelism if we can either unlock the lock 
> while binning, dedicate another thread to do the binning, or use one of the 
> send threads to do the binning.
> This has not been verified empirically yet, so there is not yet any profiling 
> info to indicate the level of improvement that we should expect. Profiling 
> and repeatable demonstration of this performance bottleneck should be the 
> first step on this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations

2016-03-02 Thread Dave Marion (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Marion resolved ACCUMULO-1755.
---
Resolution: Fixed

Committed to 1.6 and merged up to master. Built with 'mvn clean verify 
-DskipITs' on each branch and ran the new IT seperately.

> BatchWriter blocks all addMutation calls while binning mutations
> 
>
> Key: ACCUMULO-1755
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1755
> Project: Accumulo
>  Issue Type: Improvement
>  Components: client
>Reporter: Adam Fuchs
>Assignee: Dave Marion
> Fix For: 1.6.6, 1.7.2, 1.8.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Through code inspection, we found that the BatchWriter bins mutations inside 
> of a synchronized block that covers calls to addMutation. Binning potentially 
> involves lookups of tablet metadata and processes a fair amount of 
> information. We will get better parallelism if we can either unlock the lock 
> while binning, dedicate another thread to do the binning, or use one of the 
> send threads to do the binning.
> This has not been verified empirically yet, so there is not yet any profiling 
> info to indicate the level of improvement that we should expect. Profiling 
> and repeatable demonstration of this performance bottleneck should be the 
> first step on this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations

2016-03-02 Thread Dave Marion (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Marion updated ACCUMULO-1755:
--
Attachment: ACCUMULO-1755.patch

Attaching original patch

> BatchWriter blocks all addMutation calls while binning mutations
> 
>
> Key: ACCUMULO-1755
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1755
> Project: Accumulo
>  Issue Type: Improvement
>  Components: client
>Reporter: Adam Fuchs
>Assignee: Dave Marion
> Fix For: 1.6.6, 1.7.2, 1.8.0
>
> Attachments: ACCUMULO-1755.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Through code inspection, we found that the BatchWriter bins mutations inside 
> of a synchronized block that covers calls to addMutation. Binning potentially 
> involves lookups of tablet metadata and processes a fair amount of 
> information. We will get better parallelism if we can either unlock the lock 
> while binning, dedicate another thread to do the binning, or use one of the 
> send threads to do the binning.
> This has not been verified empirically yet, so there is not yet any profiling 
> info to indicate the level of improvement that we should expect. Profiling 
> and repeatable demonstration of this performance bottleneck should be the 
> first step on this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations

2016-03-02 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176713#comment-15176713
 ] 

Dave Marion commented on ACCUMULO-1755:
---

I took the test that I created and ran it against master and my feature branch 
with 1 to 6 threads. I didn't see much difference, but looking back at it now I 
think its because the test pre-creates all of the mutations and adds them as 
fast as possible. The test is really for multi-threaded correctness rather than 
performance. In the new code there is still a synchronization point when adding 
the binned mutations to the queues for the tablet servers. The send threads in 
the test (local mini accumulo cluster) must be able to keep up with adding of 
the binned mutations. I don't expect that to be the case in a real deployment. 
Good news - performance wasn't worse.

I think a better test is to write a simple multi-threaded client that creates 
and adds mutations to a common batch writer. Then, time the application as 
whole trying to insert N mutations with 1 to N client threads. The previous 
implementation blocked all client threads from calling 
BatchWriter.addMutation(), meaning the clients could not do any work. In the 
new implementation the clients will be able to continue to do work, adding 
mutations, and even binning them in their own thread if necessary, before 
blocking. I'll see if I can re-test with this new approach in the next few 
days. Do you have a different thought about how to test this?

> BatchWriter blocks all addMutation calls while binning mutations
> 
>
> Key: ACCUMULO-1755
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1755
> Project: Accumulo
>  Issue Type: Improvement
>  Components: client
>Reporter: Adam Fuchs
>Assignee: Dave Marion
> Fix For: 1.6.6, 1.7.2, 1.8.0
>
> Attachments: ACCUMULO-1755.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Through code inspection, we found that the BatchWriter bins mutations inside 
> of a synchronized block that covers calls to addMutation. Binning potentially 
> involves lookups of tablet metadata and processes a fair amount of 
> information. We will get better parallelism if we can either unlock the lock 
> while binning, dedicate another thread to do the binning, or use one of the 
> send threads to do the binning.
> This has not been verified empirically yet, so there is not yet any profiling 
> info to indicate the level of improvement that we should expect. Profiling 
> and repeatable demonstration of this performance bottleneck should be the 
> first step on this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations

2016-03-03 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178398#comment-15178398
 ] 

Dave Marion commented on ACCUMULO-1755:
---

bq.  The previous implementation blocked all client threads from calling 
BatchWriter.addMutation(), meaning the clients could not do any work. In the 
new implementation the clients will be able to continue to do work, adding 
mutations, and even binning them in their own thread if necessary, before 
blocking.

My statement from above is incorrect. We didn't remove the synchronization from 
TabletServerBatchWriter.addMutation. We only made it such that the binning is 
done either in a background thread or the current thread.

> BatchWriter blocks all addMutation calls while binning mutations
> 
>
> Key: ACCUMULO-1755
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1755
> Project: Accumulo
>  Issue Type: Improvement
>  Components: client
>Reporter: Adam Fuchs
>Assignee: Dave Marion
> Fix For: 1.6.6, 1.7.2, 1.8.0
>
> Attachments: ACCUMULO-1755.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Through code inspection, we found that the BatchWriter bins mutations inside 
> of a synchronized block that covers calls to addMutation. Binning potentially 
> involves lookups of tablet metadata and processes a fair amount of 
> information. We will get better parallelism if we can either unlock the lock 
> while binning, dedicate another thread to do the binning, or use one of the 
> send threads to do the binning.
> This has not been verified empirically yet, so there is not yet any profiling 
> info to indicate the level of improvement that we should expect. Profiling 
> and repeatable demonstration of this performance bottleneck should be the 
> first step on this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations

2016-03-04 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15180139#comment-15180139
 ] 

Dave Marion commented on ACCUMULO-1755:
---

[~afuchs] [~kturner] FWIW, I have been doing some testing locally. I have not 
been able to show any real performance improvement. Running an application with 
this patch still shows multiple client threads blocking on 
TabletServerBatchWriter.addMutation() because of the synchronization on that 
method. All this patch did was make 1 of the client threads execute that method 
faster. I think the real performance improvement will be removing the 
synchronization modifier from the addMutation method.

> BatchWriter blocks all addMutation calls while binning mutations
> 
>
> Key: ACCUMULO-1755
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1755
> Project: Accumulo
>  Issue Type: Improvement
>  Components: client
>Reporter: Adam Fuchs
>Assignee: Dave Marion
> Fix For: 1.6.6, 1.7.2, 1.8.0
>
> Attachments: ACCUMULO-1755.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Through code inspection, we found that the BatchWriter bins mutations inside 
> of a synchronized block that covers calls to addMutation. Binning potentially 
> involves lookups of tablet metadata and processes a fair amount of 
> information. We will get better parallelism if we can either unlock the lock 
> while binning, dedicate another thread to do the binning, or use one of the 
> send threads to do the binning.
> This has not been verified empirically yet, so there is not yet any profiling 
> info to indicate the level of improvement that we should expect. Profiling 
> and repeatable demonstration of this performance bottleneck should be the 
> first step on this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4004) open WALs prevent DN decommissioning

2016-04-01 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15221909#comment-15221909
 ] 

Dave Marion commented on ACCUMULO-4004:
---

bq.  it sounds like it would be important/useful for ops people to know about.

Agreed. I had not considered updates to the book - I don't have a copy and I 
don't know if there are planned updates.

> open WALs prevent DN decommissioning
> 
>
> Key: ACCUMULO-4004
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4004
> Project: Accumulo
>  Issue Type: Improvement
>  Components: tserver
>Reporter: Eric Newton
>Assignee: Dave Marion
> Fix For: 1.6.6, 1.7.2, 1.8.0
>
> Attachments: ACCUMULO-4004-1.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> It should be possible to manually roll WALs so that files on decommissioning 
> datanodes are closed and the decommissioning process can complete. At the 
> very least, the logs could be closed after an elapsed period of time, such as 
> an hour.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ACCUMULO-4176) Add TSERV_WALOG_MAX_AGE property to 1.8 user manual

2016-04-01 Thread Dave Marion (JIRA)
Dave Marion created ACCUMULO-4176:
-

 Summary: Add TSERV_WALOG_MAX_AGE property to 1.8 user manual
 Key: ACCUMULO-4176
 URL: https://issues.apache.org/jira/browse/ACCUMULO-4176
 Project: Accumulo
  Issue Type: Task
  Components: docs
Affects Versions: 1.8.0
Reporter: Dave Marion
Priority: Blocker






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4004) open WALs prevent DN decommissioning

2016-04-01 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15222061#comment-15222061
 ] 

Dave Marion commented on ACCUMULO-4004:
---

Yes, I will create a ticket to add the documentation for the new property. Good 
catch.

> open WALs prevent DN decommissioning
> 
>
> Key: ACCUMULO-4004
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4004
> Project: Accumulo
>  Issue Type: Improvement
>  Components: tserver
>Reporter: Eric Newton
>Assignee: Dave Marion
> Fix For: 1.6.6, 1.7.2, 1.8.0
>
> Attachments: ACCUMULO-4004-1.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> It should be possible to manually roll WALs so that files on decommissioning 
> datanodes are closed and the decommissioning process can complete. At the 
> very least, the logs could be closed after an elapsed period of time, such as 
> an hour.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4177) TinyLFU-based BlockCache

2016-04-04 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224240#comment-15224240
 ] 

Dave Marion commented on ACCUMULO-4177:
---

Right, we need to make sure that our users know well in advance when we plan to 
change the minimum java runtime version so that they can plan ahead. Or, we 
could switch the implementation at runtime depending on the version of the 
runtime engine.

> TinyLFU-based BlockCache
> 
>
> Key: ACCUMULO-4177
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4177
> Project: Accumulo
>  Issue Type: Improvement
>Reporter: Ben Manes
>
> [LruBlockCache|https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/file/blockfile/cache/LruBlockCache.java]
>  appears to be based on HBase's. I currently have a patch being reviewed in 
> [HBASE-15560|https://issues.apache.org/jira/browse/HBASE-15560] that replaces 
> the pseudo Segmented LRU with the TinyLFU eviction policy. That should allow 
> the cache to make [better 
> predictions|https://github.com/ben-manes/caffeine/wiki/Efficiency] based on 
> frequency and recency, such as improved scan resistance. The implementation 
> uses [Caffeine|https://github.com/ben-manes/caffeine], the successor to 
> Guava's cache, to provide concurrency and keep the patch small.
> Full details are in the JIRA ticket. I think it should be easy to port if 
> there is interest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4177) TinyLFU-based BlockCache

2016-04-04 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224270#comment-15224270
 ] 

Dave Marion commented on ACCUMULO-4177:
---

A vote on the user list, right?

> TinyLFU-based BlockCache
> 
>
> Key: ACCUMULO-4177
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4177
> Project: Accumulo
>  Issue Type: Improvement
>Reporter: Ben Manes
>
> [LruBlockCache|https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/file/blockfile/cache/LruBlockCache.java]
>  appears to be based on HBase's. I currently have a patch being reviewed in 
> [HBASE-15560|https://issues.apache.org/jira/browse/HBASE-15560] that replaces 
> the pseudo Segmented LRU with the TinyLFU eviction policy. That should allow 
> the cache to make [better 
> predictions|https://github.com/ben-manes/caffeine/wiki/Efficiency] based on 
> frequency and recency, such as improved scan resistance. The implementation 
> uses [Caffeine|https://github.com/ben-manes/caffeine], the successor to 
> Guava's cache, to provide concurrency and keep the patch small.
> Full details are in the JIRA ticket. I think it should be easy to port if 
> there is interest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ACCUMULO-4004) open WALs prevent DN decommissioning

2016-03-29 Thread Dave Marion (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Marion updated ACCUMULO-4004:
--
Attachment: ACCUMULO-4004-1.patch

> open WALs prevent DN decommissioning
> 
>
> Key: ACCUMULO-4004
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4004
> Project: Accumulo
>  Issue Type: Improvement
>  Components: tserver
>Reporter: Eric Newton
>Assignee: Eric Newton
> Fix For: 1.6.6, 1.7.2, 1.8.0
>
> Attachments: ACCUMULO-4004-1.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It should be possible to manually roll WALs so that files on decommissioning 
> datanodes are closed and the decommissioning process can complete. At the 
> very least, the logs could be closed after an elapsed period of time, such as 
> an hour.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4004) open WALs prevent DN decommissioning

2016-03-30 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217898#comment-15217898
 ] 

Dave Marion commented on ACCUMULO-4004:
---

Basically decommissioning is broken right now in Hadoop 2.

WALogs stay open until they hit the size threshold, which could be many hours 
or days in some cases. These open files will prevent a DN from finishing its 
decommissioning process[1]. If you stop the DN, then the WALog file will not be 
closed and you could lose data. You have to find the tservers that are writing 
to the WALog and stop them so that the WALog is closed.

There is also another nasty bug[2] where the NN gives clients old locations of 
blocks that have been moved due to decommissioning. As you can imagine this can 
create all kinds of problems. Then, there is [3] with all of its related issues.

With this patch, you can set the max age to the amount of time you are willing 
to wait for a DN to decommission (if you choose to take the risk of hitting 
[2]).

[1] https://issues.apache.org/jira/browse/HDFS-3599
[2] https://issues.apache.org/jira/browse/HDFS-8208
[3] https://issues.apache.org/jira/browse/HDFS-8406

> open WALs prevent DN decommissioning
> 
>
> Key: ACCUMULO-4004
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4004
> Project: Accumulo
>  Issue Type: Improvement
>  Components: tserver
>Reporter: Eric Newton
>Assignee: Eric Newton
> Fix For: 1.6.6, 1.7.2, 1.8.0
>
> Attachments: ACCUMULO-4004-1.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It should be possible to manually roll WALs so that files on decommissioning 
> datanodes are closed and the decommissioning process can complete. At the 
> very least, the logs could be closed after an elapsed period of time, such as 
> an hour.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ACCUMULO-4173) Balance table within a set of hosts

2016-03-29 Thread Dave Marion (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Marion updated ACCUMULO-4173:
--
Attachment: ACCUMULO-4173-1.patch

> Balance table within a set of hosts
> ---
>
> Key: ACCUMULO-4173
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4173
> Project: Accumulo
>  Issue Type: Bug
>  Components: master
>Reporter: Dave Marion
>Assignee: Dave Marion
>  Labels: balancer
> Fix For: 1.8.0
>
> Attachments: ACCUMULO-4173-1.patch
>
>
> Create a table balancer that will provide a set of hosts for the table tablet 
> balancer to use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4169) TabletServer.config contextCleaner removes contexts that are not set on a table

2016-03-31 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219873#comment-15219873
 ] 

Dave Marion commented on ACCUMULO-4169:
---

I will create a ticket for follow-on work to add a timeout for contexts so that 
they can be closed when they are no longer referenced in running scans or 
compactions threads.

> TabletServer.config contextCleaner removes contexts that are not set on a 
> table
> ---
>
> Key: ACCUMULO-4169
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4169
> Project: Accumulo
>  Issue Type: Bug
>  Components: tserver
>Affects Versions: 1.8.0
>Reporter: Dave Marion
>Assignee: Dave Marion
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> ACCUMULO-3948 added a feature where you could define a context in the 
> Accumulo configuration, not set it on a table, and use it in a Scanner. 
> However, there is a runnable created n TabletServer.config() that runs every 
> 60 seconds that closes context that are not defined on a table. Suggesting 
> that we have the context cleaner not close any context defined in the 
> configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ACCUMULO-4174) Clean contexts that are not used after some time period

2016-03-31 Thread Dave Marion (JIRA)
Dave Marion created ACCUMULO-4174:
-

 Summary: Clean contexts that are not used after some time period
 Key: ACCUMULO-4174
 URL: https://issues.apache.org/jira/browse/ACCUMULO-4174
 Project: Accumulo
  Issue Type: Improvement
  Components: tserver
Affects Versions: 1.8.0
Reporter: Dave Marion


A runnable is created in TabletServer.config() to clean classpath contexts 
based on a timer. The code currently closes any context that is open but no 
longer defined in the configuration. As an improvement we could close contexts 
that are defined but have not been used in some amount of time. We would need 
to track the usage of the classloaders from the context objects to do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (ACCUMULO-4169) TabletServer.config contextCleaner removes contexts that are not set on a table

2016-03-31 Thread Dave Marion (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Marion resolved ACCUMULO-4169.
---
   Resolution: Fixed
Fix Version/s: 1.8.0
   1.7.2
   1.6.6

> TabletServer.config contextCleaner removes contexts that are not set on a 
> table
> ---
>
> Key: ACCUMULO-4169
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4169
> Project: Accumulo
>  Issue Type: Bug
>  Components: tserver
>Affects Versions: 1.8.0
>Reporter: Dave Marion
>Assignee: Dave Marion
> Fix For: 1.6.6, 1.7.2, 1.8.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> ACCUMULO-3948 added a feature where you could define a context in the 
> Accumulo configuration, not set it on a table, and use it in a Scanner. 
> However, there is a runnable created n TabletServer.config() that runs every 
> 60 seconds that closes context that are not defined on a table. Suggesting 
> that we have the context cleaner not close any context defined in the 
> configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ACCUMULO-4169) TabletServer.config contextCleaner removes contexts that are not set on a table

2016-03-31 Thread Dave Marion (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Marion updated ACCUMULO-4169:
--
Attachment: ACCUMULO-4169-1.patch

> TabletServer.config contextCleaner removes contexts that are not set on a 
> table
> ---
>
> Key: ACCUMULO-4169
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4169
> Project: Accumulo
>  Issue Type: Bug
>  Components: tserver
>Affects Versions: 1.8.0
>Reporter: Dave Marion
>Assignee: Dave Marion
> Fix For: 1.6.6, 1.7.2, 1.8.0
>
> Attachments: ACCUMULO-4169-1.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> ACCUMULO-3948 added a feature where you could define a context in the 
> Accumulo configuration, not set it on a table, and use it in a Scanner. 
> However, there is a runnable created n TabletServer.config() that runs every 
> 60 seconds that closes context that are not defined on a table. Suggesting 
> that we have the context cleaner not close any context defined in the 
> configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (ACCUMULO-4004) open WALs prevent DN decommissioning

2016-03-31 Thread Dave Marion (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Marion reassigned ACCUMULO-4004:
-

Assignee: Dave Marion  (was: Eric Newton)

> open WALs prevent DN decommissioning
> 
>
> Key: ACCUMULO-4004
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4004
> Project: Accumulo
>  Issue Type: Improvement
>  Components: tserver
>Reporter: Eric Newton
>Assignee: Dave Marion
> Fix For: 1.6.6, 1.7.2, 1.8.0
>
> Attachments: ACCUMULO-4004-1.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> It should be possible to manually roll WALs so that files on decommissioning 
> datanodes are closed and the decommissioning process can complete. At the 
> very least, the logs could be closed after an elapsed period of time, such as 
> an hour.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (ACCUMULO-4004) open WALs prevent DN decommissioning

2016-03-31 Thread Dave Marion (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Marion resolved ACCUMULO-4004.
---
Resolution: Fixed

> open WALs prevent DN decommissioning
> 
>
> Key: ACCUMULO-4004
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4004
> Project: Accumulo
>  Issue Type: Improvement
>  Components: tserver
>Reporter: Eric Newton
>Assignee: Dave Marion
> Fix For: 1.6.6, 1.7.2, 1.8.0
>
> Attachments: ACCUMULO-4004-1.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> It should be possible to manually roll WALs so that files on decommissioning 
> datanodes are closed and the decommissioning process can complete. At the 
> very least, the logs could be closed after an elapsed period of time, such as 
> an hour.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4177) TinyLFU-based BlockCache

2016-04-02 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223019#comment-15223019
 ] 

Dave Marion commented on ACCUMULO-4177:
---

Caffeine requires Java 8, correct?

> TinyLFU-based BlockCache
> 
>
> Key: ACCUMULO-4177
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4177
> Project: Accumulo
>  Issue Type: Improvement
>Reporter: Ben Manes
>
> [LruBlockCache|https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/file/blockfile/cache/LruBlockCache.java]
>  appears to be based on HBase's. I currently have a patch being reviewed in 
> [HBASE-15560|https://issues.apache.org/jira/browse/HBASE-15560] that replaces 
> the pseudo Segmented LRU with the TinyLFU eviction policy. That should allow 
> the cache to make [better 
> predictions|https://github.com/ben-manes/caffeine/wiki/Efficiency] based on 
> frequency and recency, such as improved scan resistance. The implementation 
> uses [Caffeine|https://github.com/ben-manes/caffeine], the successor to 
> Guava's cache, to provide concurrency and keep the patch small.
> Full details are in the JIRA ticket. I think it should be easy to port if 
> there is interest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4177) TinyLFU-based BlockCache

2016-04-02 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223027#comment-15223027
 ] 

Dave Marion commented on ACCUMULO-4177:
---

Fully aware. Hadoop just moved to Java 7 in their 2.7 release and there are a 
few open issues left for Java 8 (HADOOP-11090). Not sure what our timeline is. 
I know it's been discussed, just not sure if there is a plan.

> TinyLFU-based BlockCache
> 
>
> Key: ACCUMULO-4177
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4177
> Project: Accumulo
>  Issue Type: Improvement
>Reporter: Ben Manes
>
> [LruBlockCache|https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/file/blockfile/cache/LruBlockCache.java]
>  appears to be based on HBase's. I currently have a patch being reviewed in 
> [HBASE-15560|https://issues.apache.org/jira/browse/HBASE-15560] that replaces 
> the pseudo Segmented LRU with the TinyLFU eviction policy. That should allow 
> the cache to make [better 
> predictions|https://github.com/ben-manes/caffeine/wiki/Efficiency] based on 
> frequency and recency, such as improved scan resistance. The implementation 
> uses [Caffeine|https://github.com/ben-manes/caffeine], the successor to 
> Guava's cache, to provide concurrency and keep the patch small.
> Full details are in the JIRA ticket. I think it should be easy to port if 
> there is interest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4177) TinyLFU-based BlockCache

2016-04-04 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224292#comment-15224292
 ] 

Dave Marion commented on ACCUMULO-4177:
---

There is currently 1 blocker[1] open in the Hadoop Java 8 umbrella ticket. 
Additionally, it appears that Hadoop is changing their minimum version in 
Hadoop 3.0[2]. While Accumulo works with Java 8, it doesn't mean that all of 
our users dependencies do as well. I suggest changing the minimum Java version 
in Accumulo version 2.0 and putting in the ability to change the implementation 
in cases where users are running Accumulo 1.x and Java 8.

[1] https://issues.apache.org/jira/browse/YARN-4714
[2] https://issues.apache.org/jira/browse/HADOOP-11858


> TinyLFU-based BlockCache
> 
>
> Key: ACCUMULO-4177
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4177
> Project: Accumulo
>  Issue Type: Improvement
>Reporter: Ben Manes
>
> [LruBlockCache|https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/file/blockfile/cache/LruBlockCache.java]
>  appears to be based on HBase's. I currently have a patch being reviewed in 
> [HBASE-15560|https://issues.apache.org/jira/browse/HBASE-15560] that replaces 
> the pseudo Segmented LRU with the TinyLFU eviction policy. That should allow 
> the cache to make [better 
> predictions|https://github.com/ben-manes/caffeine/wiki/Efficiency] based on 
> frequency and recency, such as improved scan resistance. The implementation 
> uses [Caffeine|https://github.com/ben-manes/caffeine], the successor to 
> Guava's cache, to provide concurrency and keep the patch small.
> Full details are in the JIRA ticket. I think it should be easy to port if 
> there is interest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ACCUMULO-4173) Balance table within a set of hosts

2016-03-29 Thread Dave Marion (JIRA)
Dave Marion created ACCUMULO-4173:
-

 Summary: Balance table within a set of hosts
 Key: ACCUMULO-4173
 URL: https://issues.apache.org/jira/browse/ACCUMULO-4173
 Project: Accumulo
  Issue Type: Bug
  Components: master
Reporter: Dave Marion
Assignee: Dave Marion
 Fix For: 1.8.0


Create a table balancer that will provide a set of hosts for the table tablet 
balancer to use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4169) TabletServer.config contextCleaner removes contexts that are not set on a table

2016-03-29 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216092#comment-15216092
 ] 

Dave Marion commented on ACCUMULO-4169:
---

bq. A possible work around on a running system is to add the context to an 
empty table to lock it in memory.

This workaround only works if you create splits for the empty table and a split 
is hosted on each tserver.

> TabletServer.config contextCleaner removes contexts that are not set on a 
> table
> ---
>
> Key: ACCUMULO-4169
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4169
> Project: Accumulo
>  Issue Type: Bug
>  Components: tserver
>Affects Versions: 1.8.0
>Reporter: Dave Marion
>
> ACCUMULO-3948 added a feature where you could define a context in the 
> Accumulo configuration, not set it on a table, and use it in a Scanner. 
> However, there is a runnable created n TabletServer.config() that runs every 
> 60 seconds that closes context that are not defined on a table. Suggesting 
> that we have the context cleaner not close any context defined in the 
> configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ACCUMULO-4173) Balance table within a set of hosts

2016-03-31 Thread Dave Marion (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Marion updated ACCUMULO-4173:
--
Attachment: ACCUMULO-4173-2.patch

> Balance table within a set of hosts
> ---
>
> Key: ACCUMULO-4173
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4173
> Project: Accumulo
>  Issue Type: Bug
>  Components: master
>Reporter: Dave Marion
>Assignee: Dave Marion
>  Labels: balancer
> Fix For: 1.8.0
>
> Attachments: ACCUMULO-4173-1.patch, ACCUMULO-4173-2.patch
>
>
> Create a table balancer that will provide a set of hosts for the table tablet 
> balancer to use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (ACCUMULO-3470) Upgrade to Commons VFS 2.1

2016-05-20 Thread Dave Marion (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-3470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Marion resolved ACCUMULO-3470.
---
Resolution: Fixed

Bumped vfs2 version 2.1, updated imports, removed VFS related code, built with 
all versions with 'mvn clean test'

> Upgrade to Commons VFS 2.1
> --
>
> Key: ACCUMULO-3470
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3470
> Project: Accumulo
>  Issue Type: Task
>Reporter: Dave Marion
>Assignee: Dave Marion
> Fix For: 1.6.6, 1.7.2, 1.8.0, 2.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Commons VFS 2.1 is nearing release. When released we need to remove the VFS 
> related classes in the start module, update the imports, and update the 
> version in the pom. Will set fixVersions when VFS is released.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3470) Upgrade to Commons VFS 2.1

2016-05-20 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15294001#comment-15294001
 ] 

Dave Marion commented on ACCUMULO-3470:
---

They were missed, they should be removed. My IDE didn't complain about invalid 
references and I missed them. I will do it when I get back on Monday.

> Upgrade to Commons VFS 2.1
> --
>
> Key: ACCUMULO-3470
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3470
> Project: Accumulo
>  Issue Type: Task
>Reporter: Dave Marion
>Assignee: Dave Marion
> Fix For: 1.6.6, 1.7.2, 1.8.0, 2.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Commons VFS 2.1 is nearing release. When released we need to remove the VFS 
> related classes in the start module, update the imports, and update the 
> version in the pom. Will set fixVersions when VFS is released.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3470) Upgrade to Commons VFS 2.1

2016-05-23 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296791#comment-15296791
 ] 

Dave Marion commented on ACCUMULO-3470:
---

Ok, I reverted the commit for updating Commons VFS from 2.0 to 2.1 in the 
Accumulo 1.6 branch. I merged that change up to 1.7, reverted the revert 
commit, and merged that all the way up to master.

> Upgrade to Commons VFS 2.1
> --
>
> Key: ACCUMULO-3470
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3470
> Project: Accumulo
>  Issue Type: Task
>Reporter: Dave Marion
>Assignee: Dave Marion
> Fix For: 1.6.6, 1.7.2, 1.8.0, 2.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Commons VFS 2.1 is nearing release. When released we need to remove the VFS 
> related classes in the start module, update the imports, and update the 
> version in the pom. Will set fixVersions when VFS is released.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3470) Upgrade to Commons VFS 2.1

2016-05-23 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296813#comment-15296813
 ] 

Dave Marion commented on ACCUMULO-3470:
---

I removed ReadOnlyHdfsFileProviderTest in 1.7 and beyond. I think my work is 
done here.

> Upgrade to Commons VFS 2.1
> --
>
> Key: ACCUMULO-3470
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3470
> Project: Accumulo
>  Issue Type: Task
>Reporter: Dave Marion
>Assignee: Dave Marion
> Fix For: 1.6.6, 1.7.2, 1.8.0, 2.0.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Commons VFS 2.1 is nearing release. When released we need to remove the VFS 
> related classes in the start module, update the imports, and update the 
> version in the pom. Will set fixVersions when VFS is released.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3470) Upgrade to Commons VFS 2.1

2016-05-19 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291135#comment-15291135
 ] 

Dave Marion commented on ACCUMULO-3470:
---

VFS 2.1 has been released. I hope to get to this soon.

> Upgrade to Commons VFS 2.1
> --
>
> Key: ACCUMULO-3470
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3470
> Project: Accumulo
>  Issue Type: Task
>Reporter: Dave Marion
>Assignee: Dave Marion
> Fix For: 1.8.0
>
>
> Commons VFS 2.1 is nearing release. When released we need to remove the VFS 
> related classes in the start module, update the imports, and update the 
> version in the pom. Will set fixVersions when VFS is released.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


<    1   2   3   4   5   6   >