[jira] [Updated] (ACCUMULO-1292) Tablet constructor can hang on vfs classloader, preventing tablets from loading
[ https://issues.apache.org/jira/browse/ACCUMULO-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Marion updated ACCUMULO-1292: -- Attachment: ACCUMULO-1292-using-locks.patch [~ecn] Alternate implementation without synchronization. > Tablet constructor can hang on vfs classloader, preventing tablets from > loading > --- > > Key: ACCUMULO-1292 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1292 > Project: Accumulo > Issue Type: Bug > Components: tserver >Affects Versions: 1.5.0, 1.6.0, 1.6.1 >Reporter: John Vines >Assignee: Eric Newton > Fix For: 1.7.0, 1.6.3 > > Attachments: ACCUMULO-1292-using-locks.patch, ACCUMULO-1292.patch > > > Taken from TODO from r1424106 regarding ACCUMULO-867. This is something that > we should at least look into more before 1.5 is released. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-1292) Tablet constructor can hang on vfs classloader, preventing tablets from loading
[ https://issues.apache.org/jira/browse/ACCUMULO-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14286813#comment-14286813 ] Dave Marion commented on ACCUMULO-1292: --- I will admit that I was distracted trying to help the kids with their homework at the same time. I will give it another look tomorrow. You might be right, I should hold the read lock while checking to see if cl is equal to null. The sychronized methods will block each other. The locks allow concurrent readers, but the write lock blocks all reads. > Tablet constructor can hang on vfs classloader, preventing tablets from > loading > --- > > Key: ACCUMULO-1292 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1292 > Project: Accumulo > Issue Type: Bug > Components: tserver >Affects Versions: 1.5.0, 1.6.0, 1.6.1 >Reporter: John Vines >Assignee: Eric Newton > Fix For: 1.7.0, 1.6.3 > > Attachments: ACCUMULO-1292-using-locks.patch, ACCUMULO-1292.patch > > > Taken from TODO from r1424106 regarding ACCUMULO-867. This is something that > we should at least look into more before 1.5 is released. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-1292) Tablet constructor can hang on vfs classloader, preventing tablets from loading
[ https://issues.apache.org/jira/browse/ACCUMULO-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14286881#comment-14286881 ] Dave Marion commented on ACCUMULO-1292: --- I was trying to remove the synchronization on the getClassLoader method due to the lazy refresh of the classloader. This is the cause of the hang in the tablet server. The FileListener api methods are called by the FileMonitor object in a different thread. I think in the current version of VFS the FileListener methods are called serially. I am not 100% sure about that and it may not be true in future versions. The intention of my patch, if it is flawed currently, is to refresh in a separate thread when a modification occurs. If another modification occurs while that thread is running, then queue another refresh. If more modifications occur, then do nothing. The thought being that the currently executing thread may miss a change that happens, but the thread that is queued and has not started will not miss it. So, at most, we will have one thread performing a refresh and another thread queued up. If for some reason there is an error in the refresh thread, then the next call to getClassLoader will force a refresh in yet another thread (or maybe we do it in the current thread to catch the reason why the background threads were failing). The fact that we have multiple readers and one writer seemed to fit the ReentrantReadWriteLock well. Maybe I didn't apply it in all cases and should be easy to fix. Now thinking about it, I do think I can tighten up the time in which the write lock is held in the refresh thread. > Tablet constructor can hang on vfs classloader, preventing tablets from > loading > --- > > Key: ACCUMULO-1292 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1292 > Project: Accumulo > Issue Type: Bug > Components: tserver >Affects Versions: 1.5.0, 1.6.0, 1.6.1 >Reporter: John Vines >Assignee: Eric Newton > Fix For: 1.7.0, 1.6.3 > > Attachments: ACCUMULO-1292-using-locks.patch, ACCUMULO-1292.patch > > > Taken from TODO from r1424106 regarding ACCUMULO-867. This is something that > we should at least look into more before 1.5 is released. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-1292) Tablet constructor can hang on vfs classloader, preventing tablets from loading
[ https://issues.apache.org/jira/browse/ACCUMULO-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287401#comment-14287401 ] Dave Marion commented on ACCUMULO-1292: --- [~elserj] Thanks for the idea about the AtomicReference. I was able to remove all of the lock code. > Tablet constructor can hang on vfs classloader, preventing tablets from > loading > --- > > Key: ACCUMULO-1292 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1292 > Project: Accumulo > Issue Type: Bug > Components: tserver >Affects Versions: 1.5.0, 1.6.0, 1.6.1 >Reporter: John Vines >Assignee: Eric Newton > Fix For: 1.7.0, 1.6.3 > > Attachments: ACCUMULO-1292-using-locks.patch, ACCUMULO-1292.patch > > > Taken from TODO from r1424106 regarding ACCUMULO-867. This is something that > we should at least look into more before 1.5 is released. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-1292) Tablet constructor can hang on vfs classloader, preventing tablets from loading
[ https://issues.apache.org/jira/browse/ACCUMULO-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287957#comment-14287957 ] Dave Marion commented on ACCUMULO-1292: --- bq. What would happen with the VFS classloader if a filesystem change happened (Jar was replaced), refresh on the classloader is started but takes a long time, class load is requested for a class that was from the replaced jar? Is that safe – it would continue to load the old version of the class? Would requests before the classloader is updated fail? The classloader should provide the old version of the jar until the new classloader is constructed. This behavior should be the same as the previous implementation of the classloader. It would fail if your application depended on the new jar being available at a certain time. I don't think we have ever had a guarantee on some time constraint across all of the tservers. It's eventually consistent within, most likely, 2 times the refresh interval. The default interval is 30 seconds, but appears to be overridden in AccumuloVFSClassLoader to 1 second (with a TODO to make configurable). Having said that, if the thread is hung on I/O to HDFS or some other service that VFS supports (http, ftp, etc.) for retrieving jars, we can't provide any guarantee. bq. Wouldn't Executors.newSingleThreadExecutor() be more concise? Is your keepAliveTime actually doing to do anything with a coreSize of 1? I want to ensure that 1 thread is running and that there is a max of 2 objects in the queue. I don't believe that with Executors.newSingleThreadExecutor() that you can change or control the size of the queue. My keepAliveTime should do nothing, but I don't think there is a constructor variant that does not require the information. bq. Need to make sure that the async refreshing thread is a daemon or provide a way to stop the thread. Good point, we should make it a daemon, maybe provide a ThreadFactory to the ThreadPoolExecutor. bq. executor.shutdown(); Good point. > Tablet constructor can hang on vfs classloader, preventing tablets from > loading > --- > > Key: ACCUMULO-1292 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1292 > Project: Accumulo > Issue Type: Bug > Components: tserver >Affects Versions: 1.5.0, 1.6.0, 1.6.1 >Reporter: John Vines >Assignee: Eric Newton > Fix For: 1.7.0, 1.6.3 > > Attachments: ACCUMULO-1292-atomic-update.patch, > ACCUMULO-1292-using-locks.patch, ACCUMULO-1292.patch > > > Taken from TODO from r1424106 regarding ACCUMULO-867. This is something that > we should at least look into more before 1.5 is released. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-1292) Tablet constructor can hang on vfs classloader, preventing tablets from loading
[ https://issues.apache.org/jira/browse/ACCUMULO-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287958#comment-14287958 ] Dave Marion commented on ACCUMULO-1292: --- bq. What would happen with the VFS classloader if a filesystem change happened (Jar was replaced), refresh on the classloader is started but takes a long time, class load is requested for a class that was from the replaced jar? Is that safe – it would continue to load the old version of the class? Would requests before the classloader is updated fail? The classloader should provide the old version of the jar until the new classloader is constructed. This behavior should be the same as the previous implementation of the classloader. It would fail if your application depended on the new jar being available at a certain time. I don't think we have ever had a guarantee on some time constraint across all of the tservers. It's eventually consistent within, most likely, 2 times the refresh interval. The default interval is 30 seconds, but appears to be overridden in AccumuloVFSClassLoader to 1 second (with a TODO to make configurable). Having said that, if the thread is hung on I/O to HDFS or some other service that VFS supports (http, ftp, etc.) for retrieving jars, we can't provide any guarantee. bq. Wouldn't Executors.newSingleThreadExecutor() be more concise? Is your keepAliveTime actually doing to do anything with a coreSize of 1? I want to ensure that 1 thread is running and that there is a max of 2 objects in the queue. I don't believe that with Executors.newSingleThreadExecutor() that you can change or control the size of the queue. My keepAliveTime should do nothing, but I don't think there is a constructor variant that does not require the information. bq. Need to make sure that the async refreshing thread is a daemon or provide a way to stop the thread. Good point, we should make it a daemon, maybe provide a ThreadFactory to the ThreadPoolExecutor. bq. executor.shutdown(); Good point. > Tablet constructor can hang on vfs classloader, preventing tablets from > loading > --- > > Key: ACCUMULO-1292 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1292 > Project: Accumulo > Issue Type: Bug > Components: tserver >Affects Versions: 1.5.0, 1.6.0, 1.6.1 >Reporter: John Vines >Assignee: Eric Newton > Fix For: 1.7.0, 1.6.3 > > Attachments: ACCUMULO-1292-atomic-update.patch, > ACCUMULO-1292-using-locks.patch, ACCUMULO-1292.patch > > > Taken from TODO from r1424106 regarding ACCUMULO-867. This is something that > we should at least look into more before 1.5 is released. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (ACCUMULO-1292) Tablet constructor can hang on vfs classloader, preventing tablets from loading
[ https://issues.apache.org/jira/browse/ACCUMULO-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Marion updated ACCUMULO-1292: -- Comment: was deleted (was: bq. What would happen with the VFS classloader if a filesystem change happened (Jar was replaced), refresh on the classloader is started but takes a long time, class load is requested for a class that was from the replaced jar? Is that safe – it would continue to load the old version of the class? Would requests before the classloader is updated fail? The classloader should provide the old version of the jar until the new classloader is constructed. This behavior should be the same as the previous implementation of the classloader. It would fail if your application depended on the new jar being available at a certain time. I don't think we have ever had a guarantee on some time constraint across all of the tservers. It's eventually consistent within, most likely, 2 times the refresh interval. The default interval is 30 seconds, but appears to be overridden in AccumuloVFSClassLoader to 1 second (with a TODO to make configurable). Having said that, if the thread is hung on I/O to HDFS or some other service that VFS supports (http, ftp, etc.) for retrieving jars, we can't provide any guarantee. bq. Wouldn't Executors.newSingleThreadExecutor() be more concise? Is your keepAliveTime actually doing to do anything with a coreSize of 1? I want to ensure that 1 thread is running and that there is a max of 2 objects in the queue. I don't believe that with Executors.newSingleThreadExecutor() that you can change or control the size of the queue. My keepAliveTime should do nothing, but I don't think there is a constructor variant that does not require the information. bq. Need to make sure that the async refreshing thread is a daemon or provide a way to stop the thread. Good point, we should make it a daemon, maybe provide a ThreadFactory to the ThreadPoolExecutor. bq. executor.shutdown(); Good point.) > Tablet constructor can hang on vfs classloader, preventing tablets from > loading > --- > > Key: ACCUMULO-1292 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1292 > Project: Accumulo > Issue Type: Bug > Components: tserver >Affects Versions: 1.5.0, 1.6.0, 1.6.1 >Reporter: John Vines >Assignee: Eric Newton > Fix For: 1.7.0, 1.6.3 > > Attachments: ACCUMULO-1292-atomic-update.patch, > ACCUMULO-1292-using-locks.patch, ACCUMULO-1292.patch > > > Taken from TODO from r1424106 regarding ACCUMULO-867. This is something that > we should at least look into more before 1.5 is released. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-1292) Tablet constructor can hang on vfs classloader, preventing tablets from loading
[ https://issues.apache.org/jira/browse/ACCUMULO-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288095#comment-14288095 ] Dave Marion commented on ACCUMULO-1292: --- Small nit, feel free to ignore. The missing paren threw me off: + log.trace("Ignoring refresh request (already refreshing"); > Tablet constructor can hang on vfs classloader, preventing tablets from > loading > --- > > Key: ACCUMULO-1292 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1292 > Project: Accumulo > Issue Type: Bug > Components: tserver >Affects Versions: 1.5.0, 1.6.0, 1.6.1 >Reporter: John Vines >Assignee: Eric Newton > Fix For: 1.7.0, 1.6.3 > > Attachments: ACCUMULO-1292-04.patch, > ACCUMULO-1292-atomic-update.patch, ACCUMULO-1292-using-locks.patch, > ACCUMULO-1292.patch > > > Taken from TODO from r1424106 regarding ACCUMULO-867. This is something that > we should at least look into more before 1.5 is released. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-1292) Tablet constructor can hang on vfs classloader, preventing tablets from loading
[ https://issues.apache.org/jira/browse/ACCUMULO-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288121#comment-14288121 ] Dave Marion commented on ACCUMULO-1292: --- We changed the implementation of the refresh. There were tests already in place to test that the refresh was working. Those tests pass. Do you think we need more in-depth tests? > Tablet constructor can hang on vfs classloader, preventing tablets from > loading > --- > > Key: ACCUMULO-1292 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1292 > Project: Accumulo > Issue Type: Bug > Components: tserver >Affects Versions: 1.5.0, 1.6.0, 1.6.1 >Reporter: John Vines >Assignee: Eric Newton > Fix For: 1.7.0, 1.6.3 > > Attachments: ACCUMULO-1292-04.patch, > ACCUMULO-1292-atomic-update.patch, ACCUMULO-1292-using-locks.patch, > ACCUMULO-1292.patch > > > Taken from TODO from r1424106 regarding ACCUMULO-867. This is something that > we should at least look into more before 1.5 is released. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-1292) Tablet constructor can hang on vfs classloader, preventing tablets from loading
[ https://issues.apache.org/jira/browse/ACCUMULO-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288516#comment-14288516 ] Dave Marion commented on ACCUMULO-1292: --- Looks good to me. Only thing I saw is something that [~elserj] raised, calling executor.shutdownNow() in close(). > Tablet constructor can hang on vfs classloader, preventing tablets from > loading > --- > > Key: ACCUMULO-1292 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1292 > Project: Accumulo > Issue Type: Bug > Components: tserver >Affects Versions: 1.5.0, 1.6.0, 1.6.1 >Reporter: John Vines >Assignee: Eric Newton > Fix For: 1.7.0, 1.6.3 > > Attachments: ACCUMULO-1292-04.patch, ACCUMULO-1292-05.patch, > ACCUMULO-1292-06.patch, ACCUMULO-1292-atomic-update.patch, > ACCUMULO-1292-using-locks.patch, ACCUMULO-1292.patch > > > Taken from TODO from r1424106 regarding ACCUMULO-867. This is something that > we should at least look into more before 1.5 is released. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-2911) setscaniter and setshelliter unable to load class.
[ https://issues.apache.org/jira/browse/ACCUMULO-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292796#comment-14292796 ] Dave Marion commented on ACCUMULO-2911: --- [~medined] I think this may have been fixed in ACCUMULO-3093. Good to close? > setscaniter and setshelliter unable to load class. > -- > > Key: ACCUMULO-2911 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2911 > Project: Accumulo > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: David Medinets >Priority: Trivial > > Problem: > I can use a custom iterator using the setiter command but the same iterator > does not work using the setscaniter or setshelliter commands. > References: > https://blogs.apache.org/accumulo/entry/the_accumulo_classloader > http://accumulo.apache.org/1.5/examples/classpath.html > Description: > I am using my https://github.com/medined/D4M_Schema project to start > Accumulo. So the environment that I am using can be duplicated exactly if > needed. I am using > Accumulo: 1.5.0 > Hadoop: 1.2.1 > The classpath settings in accumulo-site.xml are the following (which I think > are the default): > > general.classpaths > > $ACCUMULO_HOME/server/target/classes/, > $ACCUMULO_HOME/core/target/classes/, > $ACCUMULO_HOME/start/target/classes/, > $ACCUMULO_HOME/examples/target/classes/, > $ACCUMULO_HOME/lib/[^.].$ACCUMULO_VERSION.jar, > $ACCUMULO_HOME/lib/[^.].*.jar, > $ZOOKEEPER_HOME/zookeeper[^.].*.jar, > $HADOOP_HOME/conf, > $HADOOP_HOME/[^.].*.jar, > $HADOOP_HOME/lib/[^.].*.jar, > > Classpaths that accumulo checks for updates and class > files. > When using the Security Manager, please remove the > ".../target/classes/" values. > > > I can load my iterator using setiter but not with setscaniter or setshelliter. > Here is my do-nothing iterator: > public class MyIterator extends WrappingIterator implements OptionDescriber { > @Override > public IteratorOptions describeOptions() { > String name = "dummy"; > String description = "Dummy Description"; > Map namedOptions = new HashMap(); > List unnamedOptionDescriptions = null; > return new IteratorOptions(name, description, namedOptions, > unnamedOptionDescriptions); > } > @Override > public boolean validateOptions(Map options) { > return true; > } > > } > I copy the jar file out to HDFS: > hadoop fs -mkdir /user/vagrant/d4m/classpath > hadoop fs -put /vagrant/schema/target/d4m_schema-0.0.1-SNAPSHOT.jar > /user/vagrant/classpath > I set the table-specific classpath context: > createtable atest > table atest > insert row cf cq value > config -s > general.vfs.context.classpath.d4m=hdfs://affy-master:9000/user/vagrant/classpath > config -t atest -s table.classpath.context=d4m > Now I can configure the iterator and scan over the single row without a > problem: > setiter -n MyIterator -p 10 -scan -minc -majc -class > com.codebits.d4m.iterator.MyIterator > scan > deleteiter -n MyIterator -scan -minc -majc > However, the setscaniter commands fails: > root@instance atest> setscaniter -n MyIterator -p 10 -class > com.codebits.d4m.iterator.MyIterator > 2014-06-15 02:54:14,098 [shell.Shell] WARN : Deprecated, use setshelliter > Dummy Description > 2014-06-15 02:54:14,126 [shell.Shell] ERROR: > org.apache.accumulo.core.util.shell.ShellCommandException: Command could not > be initialized (Unable to load com.codebits.d4m.iterator.MyIterator) > As does the setshelliter: > root@instance atest> setshelliter -pn d4m -n MyIterator -p 10 -class > com.codebits.d4m.iterator.MyIterator > Dummy Description > 2014-06-15 02:55:07,025 [shell.Shell] ERROR: > org.apache.accumulo.core.util.shell.ShellCommandException: Command could not > be initialized (Unable to load com.codebits.d4m.iterator.MyIterator) > I don't see any messages in the log files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ACCUMULO-3552) CopyFailed on BulkImport leaves metadata entries
Dave Marion created ACCUMULO-3552: - Summary: CopyFailed on BulkImport leaves metadata entries Key: ACCUMULO-3552 URL: https://issues.apache.org/jira/browse/ACCUMULO-3552 Project: Accumulo Issue Type: Bug Components: master Affects Versions: 1.6.1 Reporter: Dave Marion Assignee: Eric Newton Fate shows that a BulkImport failed in CopyFailed. The metadata table shows files with the loaded prefix for tablets from this bulk import. Failing and deleting the fate transaction, then compacting the tablet, does not remove these entries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3555) TabletServerBatchReaderIterator doesn't maintain reference to TabletServerBatchReader
[ https://issues.apache.org/jira/browse/ACCUMULO-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303491#comment-14303491 ] Dave Marion commented on ACCUMULO-3555: --- bq. One potential change would be to introduce the reference in the TSBRI to TSBR and then add a hook to automatically close() the TSBR when the TSBRI exhausts itself. This would change the behavior we have today where I can do the following: {code} BatchScanner bs = conn.createBatchScanner(tableName, Authorizations.EMPTY, 4); bs.setRanges(Collections.singleton(new Range())); Iterator> iterator = bs.iterator(); doThings(iterator); Iterator> iterator = bs.iterator(); doDifferentThings(iterator); {code} > TabletServerBatchReaderIterator doesn't maintain reference to > TabletServerBatchReader > - > > Key: ACCUMULO-3555 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3555 > Project: Accumulo > Issue Type: Bug > Components: client >Affects Versions: 1.5.0, 1.5.1, 1.5.2, 1.6.0, 1.6.1 >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Blocker > > Had a user in IRC run into this again today upgrading a 1.4 instance to 1.6.0. > ACCUMULO-587 introduced a {{finalize}} implementation into > {{TabletServerBatchReader}} in an attempt to close the {{BatchScanner}} when > the user might have forgotten to do so themselves. The problem, however, is > that the {{TabletServerBatchReaderIterator}} doesn't maintain a reference to > the {{TabletServerBatchReader}} (notice how it only uses it to create a new > instnace of {{ScannerOptions}} using the copy constructor). > In other words, when the {{TabletServerBatchReaderIterator}} is constructed, > it has no references in the object graph to the {{TabletServerBatchReader}} > it was created from. This means that if clients don't hold onto the > BatchScanner instance, it's possible that it gets closed by the JVM calling > {{finalize()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3555) TabletServerBatchReaderIterator doesn't maintain reference to TabletServerBatchReader
[ https://issues.apache.org/jira/browse/ACCUMULO-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303512#comment-14303512 ] Dave Marion commented on ACCUMULO-3555: --- Looking at 1.6 code, but close just shuts down the query thread pool, and TSBR.iterator() just does some state validation. It could be as simple as TSBR.iterator() creates a new threadpool for each invocation, but that could exhaust resources. If there are any non-backwards compatible changes here, then this will need to be in a major version (2.0). > TabletServerBatchReaderIterator doesn't maintain reference to > TabletServerBatchReader > - > > Key: ACCUMULO-3555 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3555 > Project: Accumulo > Issue Type: Bug > Components: client >Affects Versions: 1.5.0, 1.5.1, 1.5.2, 1.6.0, 1.6.1 >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Blocker > > Had a user in IRC run into this again today upgrading a 1.4 instance to 1.6.0. > ACCUMULO-587 introduced a {{finalize}} implementation into > {{TabletServerBatchReader}} in an attempt to close the {{BatchScanner}} when > the user might have forgotten to do so themselves. The problem, however, is > that the {{TabletServerBatchReaderIterator}} doesn't maintain a reference to > the {{TabletServerBatchReader}} (notice how it only uses it to create a new > instnace of {{ScannerOptions}} using the copy constructor). > In other words, when the {{TabletServerBatchReaderIterator}} is constructed, > it has no references in the object graph to the {{TabletServerBatchReader}} > it was created from. This means that if clients don't hold onto the > BatchScanner instance, it's possible that it gets closed by the JVM calling > {{finalize()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3555) TabletServerBatchReaderIterator doesn't maintain reference to TabletServerBatchReader
[ https://issues.apache.org/jira/browse/ACCUMULO-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303520#comment-14303520 ] Dave Marion commented on ACCUMULO-3555: --- bq. If there are any non-backwards compatible changes here, then this will need to be in a major version (2.0). Let me re-state that, if changes here cause users to change their usage of BatchScanner, then this will need to be in a major version. > TabletServerBatchReaderIterator doesn't maintain reference to > TabletServerBatchReader > - > > Key: ACCUMULO-3555 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3555 > Project: Accumulo > Issue Type: Bug > Components: client >Affects Versions: 1.5.0, 1.5.1, 1.5.2, 1.6.0, 1.6.1 >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Blocker > > Had a user in IRC run into this again today upgrading a 1.4 instance to 1.6.0. > ACCUMULO-587 introduced a {{finalize}} implementation into > {{TabletServerBatchReader}} in an attempt to close the {{BatchScanner}} when > the user might have forgotten to do so themselves. The problem, however, is > that the {{TabletServerBatchReaderIterator}} doesn't maintain a reference to > the {{TabletServerBatchReader}} (notice how it only uses it to create a new > instnace of {{ScannerOptions}} using the copy constructor). > In other words, when the {{TabletServerBatchReaderIterator}} is constructed, > it has no references in the object graph to the {{TabletServerBatchReader}} > it was created from. This means that if clients don't hold onto the > BatchScanner instance, it's possible that it gets closed by the JVM calling > {{finalize()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3555) TabletServerBatchReaderIterator doesn't maintain reference to TabletServerBatchReader
[ https://issues.apache.org/jira/browse/ACCUMULO-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303552#comment-14303552 ] Dave Marion commented on ACCUMULO-3555: --- One option might be to mark ScannerBase with the Closeable interface. Then, if the user uses a BatchScanner or Scanner in a try-with-resources statement, they won't have to worry about cleaning it up and can still use it in the manner from my example. In this case, we could modify the examples and javadoc for the BatchScanner to suggest using in a try-with-resources statement, and if the user doesn't do this then they need to maintain a reference to the BatchScanner and close it when they are done with it. > TabletServerBatchReaderIterator doesn't maintain reference to > TabletServerBatchReader > - > > Key: ACCUMULO-3555 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3555 > Project: Accumulo > Issue Type: Bug > Components: client >Affects Versions: 1.5.0, 1.5.1, 1.5.2, 1.6.0, 1.6.1 >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Blocker > > Had a user in IRC run into this again today upgrading a 1.4 instance to 1.6.0. > ACCUMULO-587 introduced a {{finalize}} implementation into > {{TabletServerBatchReader}} in an attempt to close the {{BatchScanner}} when > the user might have forgotten to do so themselves. The problem, however, is > that the {{TabletServerBatchReaderIterator}} doesn't maintain a reference to > the {{TabletServerBatchReader}} (notice how it only uses it to create a new > instnace of {{ScannerOptions}} using the copy constructor). > In other words, when the {{TabletServerBatchReaderIterator}} is constructed, > it has no references in the object graph to the {{TabletServerBatchReader}} > it was created from. This means that if clients don't hold onto the > BatchScanner instance, it's possible that it gets closed by the JVM calling > {{finalize()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3555) TabletServerBatchReaderIterator doesn't maintain reference to TabletServerBatchReader
[ https://issues.apache.org/jira/browse/ACCUMULO-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303595#comment-14303595 ] Dave Marion commented on ACCUMULO-3555: --- Well, I had a nice long response, but lost it when I clicked "Add". > TabletServerBatchReaderIterator doesn't maintain reference to > TabletServerBatchReader > - > > Key: ACCUMULO-3555 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3555 > Project: Accumulo > Issue Type: Bug > Components: client >Affects Versions: 1.5.0, 1.5.1, 1.5.2, 1.6.0, 1.6.1 >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Blocker > > Had a user in IRC run into this again today upgrading a 1.4 instance to 1.6.0. > ACCUMULO-587 introduced a {{finalize}} implementation into > {{TabletServerBatchReader}} in an attempt to close the {{BatchScanner}} when > the user might have forgotten to do so themselves. The problem, however, is > that the {{TabletServerBatchReaderIterator}} doesn't maintain a reference to > the {{TabletServerBatchReader}} (notice how it only uses it to create a new > instnace of {{ScannerOptions}} using the copy constructor). > In other words, when the {{TabletServerBatchReaderIterator}} is constructed, > it has no references in the object graph to the {{TabletServerBatchReader}} > it was created from. This means that if clients don't hold onto the > BatchScanner instance, it's possible that it gets closed by the JVM calling > {{finalize()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3555) TabletServerBatchReaderIterator doesn't maintain reference to TabletServerBatchReader
[ https://issues.apache.org/jira/browse/ACCUMULO-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303644#comment-14303644 ] Dave Marion commented on ACCUMULO-3555: --- The reference to the TSBR is passed to the TSBRI, but it then copied in the constructor and discarded. If it was not a copy, then in next() when hasNext() is false, you could call close() on the reference. I would suggest that most of the examples so far are basic and could be covered with a try-with-resources statement. I would also say that there are more advanced uses of BatchScanner in which you don't want it to AUTOCLOSE. I have not looked through the code enough, but maybe if the BatchScanner were created like the following, then you could use the logic I mentioned above. {code} BatchScanner bs = conn.createBatchScanner(tableName, Authorizations.EMPTY, 4, BatchScanner.AUTOCLOSE); {code} > TabletServerBatchReaderIterator doesn't maintain reference to > TabletServerBatchReader > - > > Key: ACCUMULO-3555 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3555 > Project: Accumulo > Issue Type: Bug > Components: client >Affects Versions: 1.5.0, 1.5.1, 1.5.2, 1.6.0, 1.6.1 >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Blocker > > Had a user in IRC run into this again today upgrading a 1.4 instance to 1.6.0. > ACCUMULO-587 introduced a {{finalize}} implementation into > {{TabletServerBatchReader}} in an attempt to close the {{BatchScanner}} when > the user might have forgotten to do so themselves. The problem, however, is > that the {{TabletServerBatchReaderIterator}} doesn't maintain a reference to > the {{TabletServerBatchReader}} (notice how it only uses it to create a new > instnace of {{ScannerOptions}} using the copy constructor). > In other words, when the {{TabletServerBatchReaderIterator}} is constructed, > it has no references in the object graph to the {{TabletServerBatchReader}} > it was created from. This means that if clients don't hold onto the > BatchScanner instance, it's possible that it gets closed by the JVM calling > {{finalize()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3645) Iterators not run at compaction when tablets are empty
[ https://issues.apache.org/jira/browse/ACCUMULO-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14352358#comment-14352358 ] Dave Marion commented on ACCUMULO-3645: --- I don't see how this is a bug. It would appear to be by design or optimization that iterators do not run on tables that have no data. Also, I'm not sold on the idea of using iterators on an empty table to execute business logic or stored procedures, but if this is what we are going to do then I suggest: 1. Creating a new issue of type "New Feature" that describes the approach 2. Make this and all other supporting issues a subtask under the new top level issue. > Iterators not run at compaction when tablets are empty > -- > > Key: ACCUMULO-3645 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3645 > Project: Accumulo > Issue Type: Bug > Components: client, mini, tserver >Affects Versions: 1.6.1 > Environment: Ubuntu 14.04, Accumulo 1.6.1, Hadoop 2.6.0, Zookeeper > 3.4.6 >Reporter: Dylan Hutchison > > When an iterator is configured to run during a one-time manual major > compaction on a tablet that is empty, the iterator never runs. It seems as if > Accumulo never constructs the SKVI stack at all, because it tries to optimize > by eliminating "unnecessary" tablet compactions. Also true of MiniAccumulo. > This is bad for "generator" iterators that emit keys (in sorted order) not > present in the tablet's data. They never have a chance to run. > A workaround is for the client to insert "dummy data" into the tablets he > wants to compact, before starting a compaction with "generator" iterators. > Note that *any* data is sufficient, even delete keys, to ensure the SKVI > stack is built and run through. > Test file: > [InjectTest|https://github.com/Accla/d4m_api_java/blob/master/src/test/java/edu/mit/ll/graphulo/InjectTest.java] > * method {{testInjectOnCompact_Empty}} fails because no data is in the tablets > * method {{testInjectOnCompact}} passes because data is written into the > tablets before initiating compaction. > Two logs follow. Both have a > [DebugIterator|https://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/DebugIterator.html] > set after an > [InjectIterator|https://github.com/Accla/d4m_api_java/blob/master/src/main/java/edu/mit/ll/graphulo/InjectIterator.java], > an SKVI which "generates" hard-coded entries. The root user scans the table > after flush and compact. > Logs when no data in the tablets: > {code} > 2015-03-05 06:58:04,914 [tserver.TabletServer] DEBUG: Got flush message from > user: !SYSTEM > 2015-03-05 06:58:04,974 [tserver.TabletServer] DEBUG: ScanSess tid > 127.0.0.1:60455 !0 6 entries in 0.01 secs, nbTimes = [4 4 4.00 1] > 2015-03-05 06:58:04,992 [tserver.TabletServer] DEBUG: MultiScanSess > 127.0.0.1:60455 15 entries in 0.01 secs (lookup_time:0.01 secs tablets:1 > ranges:1) > 2015-03-05 06:58:05,018 [tserver.TabletServer] DEBUG: MultiScanSess > 127.0.0.1:60455 2 entries in 0.00 secs (lookup_time:0.00 secs tablets:1 > ranges:1) > 2015-03-05 06:58:05,052 [tserver.TabletServer] DEBUG: ScanSess tid > 127.0.0.1:60455 !0 4 entries in 0.00 secs, nbTimes = [2 2 2.00 1] > 2015-03-05 06:58:05,054 [tserver.TabletServer] DEBUG: Got compact message > from user: !SYSTEM > 2015-03-05 06:58:05,135 [tserver.TabletServer] DEBUG: MultiScanSess > 127.0.0.1:60455 15 entries in 0.01 secs (lookup_time:0.01 secs tablets:1 > ranges:1) > 2015-03-05 06:58:05,138 [tserver.TabletServer] DEBUG: MultiScanSess > 127.0.0.1:36422 2 entries in 0.00 secs (lookup_time:0.00 secs tablets:1 > ranges:1) > 2015-03-05 06:58:05,384 [tserver.TabletServer] DEBUG: MultiScanSess > 127.0.0.1:36422 2 entries in 0.00 secs (lookup_time:0.00 secs tablets:1 > ranges:1) > 2015-03-05 06:58:05,386 [tserver.TabletServer] DEBUG: MultiScanSess > 127.0.0.1:60455 15 entries in 0.01 secs (lookup_time:0.01 secs tablets:1 > ranges:1) > 2015-03-05 06:58:05,504 [tserver.TabletServer] DEBUG: MultiScanSess > 127.0.0.1:60455 2 entries in 0.00 secs (lookup_time:0.00 secs tablets:1 > ranges:1) > 2015-03-05 06:58:05,509 [tserver.TabletServer] DEBUG: MultiScanSess > 127.0.0.1:36422 15 entries in 0.01 secs (lookup_time:0.01 secs tablets:1 > ranges:1) > 2015-03-05 06:58:06,069 [tserver.TabletServer] DEBUG: ScanSess tid > 127.0.0.1:60455 !0 6 entries in 0.01 secs, nbTimes = [3 3 3.00 1] > 2015-03-05 06:58:06,158 [Audit ] INFO : operation: permitted; user: root; > client: 127.0.0.1:36456; > 2015-03-05 06:58:06,158 [Audit ] INFO : operation: permitted; user: root; > client: 127.0.0.1:36456; > 2015-03-05 06:58:06,168 [tserver.TabletServer] DEBUG: MultiScanSess > 127.0.0.1:36456 2 entries in 0.01 secs (lookup_time:0.00 secs tablets:1 > ranges:1) > 2015-03-05 06:58:06,174 [Audit
[jira] [Commented] (ACCUMULO-3694) Include root password recovery steps in docs
[ https://issues.apache.org/jira/browse/ACCUMULO-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14378279#comment-14378279 ] Dave Marion commented on ACCUMULO-3694: --- I'm pretty sure that you can run the dumpConfig admin command without needing the root password. Which means that you can capture all of the users and their granted permissions before you reset. This should help in recreating the users. > Include root password recovery steps in docs > > > Key: ACCUMULO-3694 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3694 > Project: Accumulo > Issue Type: Improvement > Components: docs >Reporter: Frans Lawaetz >Priority: Minor > > It would be helpful to add, perhaps in the Troubleshooting section of the > docs, the semi-hidden feature that is: > accumulo init --reset-security > As a stab: > Q: I've lost the accumulo root password, now what? > A: Running "accumulo init --reset-security" will prompt you for a new root > password. CAUTION: this command will delete all existing users. You will > need to re-create all other users and set permissions accordingly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3569) Automatically restart accumulo processes intelligently
[ https://issues.apache.org/jira/browse/ACCUMULO-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481482#comment-14481482 ] Dave Marion commented on ACCUMULO-3569: --- Did you look at YAJSW? Looks like it moved to an Apache license after version 12. [1] http://yajsw.sourceforge.net/ > Automatically restart accumulo processes intelligently > -- > > Key: ACCUMULO-3569 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3569 > Project: Accumulo > Issue Type: Bug > Components: scripts >Reporter: John Vines > Fix For: 1.8.0 > > Attachments: > 0001-ACCUMULO-3569-initial-pass-at-integrating-auto-resta.patch > > > On occasion process will die, for a variety of reasons. Some reasons are > critical whereas others may be due to momentary blips. There are a variety of > reasons, but not all of the reasons warrant keeping the server down and > requiring human attention. > With that, I would like to propose a watcher process, which is an option > component that wraps the calls to the various processes (tserver, master, > etc.). This process can watch the processes, get their exit codes, read their > logs, etc. and make intelligent decisions about how to behave. This behavior > would include coarse detection of failure types (will discuss below) and a > configurable response behavior around how many attempts should be made in a > given window before giving up entirely. > As for failure types, there are a few arch ones that seem to be regularly > repeating that I think are prime candidates for an initial approach- > Zookeeper lock lost - this can happen for a variety of reasons, mostly > related to network issues or server (tserver or zk node) congestion. These > are some of the most common errors and are typically transient. However, if > these occur with great frequency then it's a sign of a larger issue that > needs to be handled by an administrator. > Jvm OOM - There are two spaces where these really seem to occur - a system > that's just poorly configured and dies shortly after it starts up and then > there is the case where the system gets slammed in just the right way where > objects in our code and/or the iterator stack may push the JVM just over the > limits. In the former case, this will fail quickly and relatively rapidly > when being restarted, whereas the latter case is something that will occur > rarely and will want attention, but doesn't warrant keeping the node offline > in the meantime. > Standard shutdown - this is just a case that occurs where we don't want it to > automatically interact because we want it to go down. Just a design > consideration. > Unexpected exceptions - this is a catch all for everything else. We can > attempt to enumerate them, but they're less common. This would be something > configured to have less tolerance for, but just because a server goes down > due to a random software bug doesn't mean that server should be removed from > the cluster unless it happens repeatedly (because then it's a sign of a > hardware/system issue). But we should provide the ability to keep resources > available in this space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3569) Automatically restart accumulo processes intelligently
[ https://issues.apache.org/jira/browse/ACCUMULO-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481627#comment-14481627 ] Dave Marion commented on ACCUMULO-3569: --- I don't plan on standing in the way here, although it does appear that others have raised concerns. I was just merely asking if you had looked at JSW / YAJSW. These tools have been around for years. JSW's license precludes it use I think, and the license for YAJSW did also until recently. > Automatically restart accumulo processes intelligently > -- > > Key: ACCUMULO-3569 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3569 > Project: Accumulo > Issue Type: Bug > Components: scripts >Reporter: John Vines > Fix For: 1.8.0 > > Attachments: > 0001-ACCUMULO-3569-initial-pass-at-integrating-auto-resta.patch > > > On occasion process will die, for a variety of reasons. Some reasons are > critical whereas others may be due to momentary blips. There are a variety of > reasons, but not all of the reasons warrant keeping the server down and > requiring human attention. > With that, I would like to propose a watcher process, which is an option > component that wraps the calls to the various processes (tserver, master, > etc.). This process can watch the processes, get their exit codes, read their > logs, etc. and make intelligent decisions about how to behave. This behavior > would include coarse detection of failure types (will discuss below) and a > configurable response behavior around how many attempts should be made in a > given window before giving up entirely. > As for failure types, there are a few arch ones that seem to be regularly > repeating that I think are prime candidates for an initial approach- > Zookeeper lock lost - this can happen for a variety of reasons, mostly > related to network issues or server (tserver or zk node) congestion. These > are some of the most common errors and are typically transient. However, if > these occur with great frequency then it's a sign of a larger issue that > needs to be handled by an administrator. > Jvm OOM - There are two spaces where these really seem to occur - a system > that's just poorly configured and dies shortly after it starts up and then > there is the case where the system gets slammed in just the right way where > objects in our code and/or the iterator stack may push the JVM just over the > limits. In the former case, this will fail quickly and relatively rapidly > when being restarted, whereas the latter case is something that will occur > rarely and will want attention, but doesn't warrant keeping the node offline > in the meantime. > Standard shutdown - this is just a case that occurs where we don't want it to > automatically interact because we want it to go down. Just a design > consideration. > Unexpected exceptions - this is a catch all for everything else. We can > attempt to enumerate them, but they're less common. This would be something > configured to have less tolerance for, but just because a server goes down > due to a random software bug doesn't mean that server should be removed from > the cluster unless it happens repeatedly (because then it's a sign of a > hardware/system issue). But we should provide the ability to keep resources > available in this space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ACCUMULO-3762) RestoreZookeeper does not work with xml generated by DumpZookeeper
Dave Marion created ACCUMULO-3762: - Summary: RestoreZookeeper does not work with xml generated by DumpZookeeper Key: ACCUMULO-3762 URL: https://issues.apache.org/jira/browse/ACCUMULO-3762 Project: Accumulo Issue Type: Bug Affects Versions: 1.6.2 Reporter: Dave Marion Priority: Minor RestoreZookeeper does not handle the ephemeral nodes in the xml and causes the stack to become empty. java.util.NoSuchElementException at java.util.Vector.lastElement(Vector.java:499) at org.apache.accumulo.server.util.RestoreZookeeper$Restore.startElement(RestoreZookeeper.java:66) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(AbstractSAXParser.java:509) at com.sun.org.apache.xerces.internal.parsers.AbstractXMLDocumentParser.emptyElement(AbstractXMLDocumentParser.java:182) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanStartElement(XMLDocumentFragmentScannerImpl.java:1350) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2778) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:649) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:333) at javax.xml.parsers.SAXParser.parse(SAXParser.java:195) at org.apache.accumulo.server.util.RestoreZookeeper.run(RestoreZookeeper.java:127) at org.apache.accumulo.server.util.RestoreZookeeper.main(RestoreZookeeper.java:121) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.accumulo.start.Main$1.run(Main.java:141) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3762) RestoreZookeeper does not work with xml generated by DumpZookeeper
[ https://issues.apache.org/jira/browse/ACCUMULO-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518500#comment-14518500 ] Dave Marion commented on ACCUMULO-3762: --- Looks like we can just push an empty node on the stack when an ephemeral type is encountered. > RestoreZookeeper does not work with xml generated by DumpZookeeper > -- > > Key: ACCUMULO-3762 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3762 > Project: Accumulo > Issue Type: Bug >Affects Versions: 1.6.2 >Reporter: Dave Marion >Priority: Minor > > RestoreZookeeper does not handle the ephemeral nodes in the xml and causes > the stack to become empty. > java.util.NoSuchElementException > at java.util.Vector.lastElement(Vector.java:499) > at > org.apache.accumulo.server.util.RestoreZookeeper$Restore.startElement(RestoreZookeeper.java:66) > at > com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(AbstractSAXParser.java:509) > at > com.sun.org.apache.xerces.internal.parsers.AbstractXMLDocumentParser.emptyElement(AbstractXMLDocumentParser.java:182) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanStartElement(XMLDocumentFragmentScannerImpl.java:1350) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2778) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510) > at > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848) > at > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777) > at > com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141) > at > com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213) > at > com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:649) > at > com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:333) > at javax.xml.parsers.SAXParser.parse(SAXParser.java:195) > at > org.apache.accumulo.server.util.RestoreZookeeper.run(RestoreZookeeper.java:127) > at > org.apache.accumulo.server.util.RestoreZookeeper.main(RestoreZookeeper.java:121) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.accumulo.start.Main$1.run(Main.java:141) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (ACCUMULO-3762) RestoreZookeeper does not work with xml generated by DumpZookeeper
[ https://issues.apache.org/jira/browse/ACCUMULO-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Marion reassigned ACCUMULO-3762: - Assignee: Dave Marion > RestoreZookeeper does not work with xml generated by DumpZookeeper > -- > > Key: ACCUMULO-3762 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3762 > Project: Accumulo > Issue Type: Bug >Affects Versions: 1.6.2 >Reporter: Dave Marion >Assignee: Dave Marion >Priority: Minor > > RestoreZookeeper does not handle the ephemeral nodes in the xml and causes > the stack to become empty. > {noformat} > java.util.NoSuchElementException > at java.util.Vector.lastElement(Vector.java:499) > at > org.apache.accumulo.server.util.RestoreZookeeper$Restore.startElement(RestoreZookeeper.java:66) > at > com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(AbstractSAXParser.java:509) > at > com.sun.org.apache.xerces.internal.parsers.AbstractXMLDocumentParser.emptyElement(AbstractXMLDocumentParser.java:182) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanStartElement(XMLDocumentFragmentScannerImpl.java:1350) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2778) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510) > at > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848) > at > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777) > at > com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141) > at > com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213) > at > com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:649) > at > com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:333) > at javax.xml.parsers.SAXParser.parse(SAXParser.java:195) > at > org.apache.accumulo.server.util.RestoreZookeeper.run(RestoreZookeeper.java:127) > at > org.apache.accumulo.server.util.RestoreZookeeper.main(RestoreZookeeper.java:121) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.accumulo.start.Main$1.run(Main.java:141) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ACCUMULO-3762) RestoreZookeeper does not work with xml generated by DumpZookeeper
[ https://issues.apache.org/jira/browse/ACCUMULO-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Marion updated ACCUMULO-3762: -- Fix Version/s: 1.6.3 1.7.0 > RestoreZookeeper does not work with xml generated by DumpZookeeper > -- > > Key: ACCUMULO-3762 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3762 > Project: Accumulo > Issue Type: Bug >Affects Versions: 1.6.2 >Reporter: Dave Marion >Assignee: Dave Marion >Priority: Minor > Fix For: 1.7.0, 1.6.3 > > Time Spent: 20m > Remaining Estimate: 0h > > RestoreZookeeper does not handle the ephemeral nodes in the xml and causes > the stack to become empty. > {noformat} > java.util.NoSuchElementException > at java.util.Vector.lastElement(Vector.java:499) > at > org.apache.accumulo.server.util.RestoreZookeeper$Restore.startElement(RestoreZookeeper.java:66) > at > com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(AbstractSAXParser.java:509) > at > com.sun.org.apache.xerces.internal.parsers.AbstractXMLDocumentParser.emptyElement(AbstractXMLDocumentParser.java:182) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanStartElement(XMLDocumentFragmentScannerImpl.java:1350) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2778) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510) > at > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848) > at > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777) > at > com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141) > at > com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213) > at > com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:649) > at > com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:333) > at javax.xml.parsers.SAXParser.parse(SAXParser.java:195) > at > org.apache.accumulo.server.util.RestoreZookeeper.run(RestoreZookeeper.java:127) > at > org.apache.accumulo.server.util.RestoreZookeeper.main(RestoreZookeeper.java:121) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.accumulo.start.Main$1.run(Main.java:141) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (ACCUMULO-3762) RestoreZookeeper does not work with xml generated by DumpZookeeper
[ https://issues.apache.org/jira/browse/ACCUMULO-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Marion resolved ACCUMULO-3762. --- Resolution: Fixed > RestoreZookeeper does not work with xml generated by DumpZookeeper > -- > > Key: ACCUMULO-3762 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3762 > Project: Accumulo > Issue Type: Bug >Affects Versions: 1.6.2 >Reporter: Dave Marion >Assignee: Dave Marion >Priority: Minor > Fix For: 1.7.0, 1.6.3 > > Time Spent: 20m > Remaining Estimate: 0h > > RestoreZookeeper does not handle the ephemeral nodes in the xml and causes > the stack to become empty. > {noformat} > java.util.NoSuchElementException > at java.util.Vector.lastElement(Vector.java:499) > at > org.apache.accumulo.server.util.RestoreZookeeper$Restore.startElement(RestoreZookeeper.java:66) > at > com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(AbstractSAXParser.java:509) > at > com.sun.org.apache.xerces.internal.parsers.AbstractXMLDocumentParser.emptyElement(AbstractXMLDocumentParser.java:182) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanStartElement(XMLDocumentFragmentScannerImpl.java:1350) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2778) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510) > at > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848) > at > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777) > at > com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141) > at > com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213) > at > com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:649) > at > com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:333) > at javax.xml.parsers.SAXParser.parse(SAXParser.java:195) > at > org.apache.accumulo.server.util.RestoreZookeeper.run(RestoreZookeeper.java:127) > at > org.apache.accumulo.server.util.RestoreZookeeper.main(RestoreZookeeper.java:121) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.accumulo.start.Main$1.run(Main.java:141) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3793) enable numa memory interleaving
[ https://issues.apache.org/jira/browse/ACCUMULO-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538138#comment-14538138 ] Dave Marion commented on ACCUMULO-3793: --- Agreed though that we can't or shouldn't set kernel parameters in the bin scripts. They should be documented somewhere like the swappiness setting. > enable numa memory interleaving > --- > > Key: ACCUMULO-3793 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3793 > Project: Accumulo > Issue Type: Improvement > Components: scripts > Environment: Large VM >Reporter: Eric Newton >Assignee: Eric Newton > Fix For: 1.6.3, 1.8.0, 1.7.1 > > > In this [fine article|http://tinyurl.com/m8lt7v2] it is recommended that NUMA > optimizations be disabled. This seems to be exactly our use-case as well, and > we have been struggling with ways to manage the linux page cache on large > production systems. Do this by default in the start scripts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3923) VFS ClassLoader doesnt' work with KeywordExecutable
[ https://issues.apache.org/jira/browse/ACCUMULO-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14602151#comment-14602151 ] Dave Marion commented on ACCUMULO-3923: --- Loading of resource files on the classpath is a bug in vfs 2.0, fixed in 2.1. I have been trying to get them to release vfs 2.1 for months. We replaced the vfs 2.0 jar with the 2.1 snapshot to get this capability. It works at runtime, but does not compile with 2.1 snapshot due to api differences. > VFS ClassLoader doesnt' work with KeywordExecutable > --- > > Key: ACCUMULO-3923 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3923 > Project: Accumulo > Issue Type: Bug >Reporter: Josh Elser >Priority: Critical > Fix For: 1.7.1, 1.8.0 > > > Trying to make the VFS classloading stuff work and it doesn't seem like > ServiceLoader is finding any of the KeywordExecutable implementations. > Best I can tell after looking into this, VFSClassLoader (created by > AccumuloVFSClassLoader) has all of the jars listed as resources, but when > ServiceLoader tries to find the META-INF/services definitions, it returns > nothing, and thus we think the keyword must be a class name. Seems like a > commons-vfs bug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3923) VFS ClassLoader doesnt' work with KeywordExecutable
[ https://issues.apache.org/jira/browse/ACCUMULO-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14602155#comment-14602155 ] Dave Marion commented on ACCUMULO-3923: --- https://issues.apache.org/jira/plugins/servlet/mobile#issue/VFS-500 > VFS ClassLoader doesnt' work with KeywordExecutable > --- > > Key: ACCUMULO-3923 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3923 > Project: Accumulo > Issue Type: Bug >Reporter: Josh Elser >Priority: Critical > Fix For: 1.7.1, 1.8.0 > > > Trying to make the VFS classloading stuff work and it doesn't seem like > ServiceLoader is finding any of the KeywordExecutable implementations. > Best I can tell after looking into this, VFSClassLoader (created by > AccumuloVFSClassLoader) has all of the jars listed as resources, but when > ServiceLoader tries to find the META-INF/services definitions, it returns > nothing, and thus we think the keyword must be a class name. Seems like a > commons-vfs bug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3923) VFS ClassLoader doesnt' work with KeywordExecutable
[ https://issues.apache.org/jira/browse/ACCUMULO-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14602195#comment-14602195 ] Dave Marion commented on ACCUMULO-3923: --- Ugh, went back and looked and I tried to start the conversation a year ago. I'll look at your patch on the other issue. > VFS ClassLoader doesnt' work with KeywordExecutable > --- > > Key: ACCUMULO-3923 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3923 > Project: Accumulo > Issue Type: Bug >Reporter: Josh Elser >Priority: Critical > Fix For: 1.7.1, 1.8.0 > > > Trying to make the VFS classloading stuff work and it doesn't seem like > ServiceLoader is finding any of the KeywordExecutable implementations. > Best I can tell after looking into this, VFSClassLoader (created by > AccumuloVFSClassLoader) has all of the jars listed as resources, but when > ServiceLoader tries to find the META-INF/services definitions, it returns > nothing, and thus we think the keyword must be a class name. Seems like a > commons-vfs bug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3783) Unexpected Filesystem Closed exceptions
[ https://issues.apache.org/jira/browse/ACCUMULO-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14602210#comment-14602210 ] Dave Marion commented on ACCUMULO-3783: --- I think the shutdown hook was necessary to close VFS connections to external resources as VFS has the ability to pull resources via http, ftp, hdfs, etc. It would be nice if we could just nuke the shutdown hook in AccumuloVFSClassLoader, but I'm not sure that's possible. Regarding the HDFS VFS objects, these same objects exist in the VFS 2.1 release. My intention was to remove the ones in Accumulo when 2.1 is ultimately released. If you make changes to them, please be sure to make the corresponding changes in Commons VFS. I would suggest building VFS 2.1 with these changes first, before committing to Accumulo, to ensure that all of the VFS tests pass. > Unexpected Filesystem Closed exceptions > --- > > Key: ACCUMULO-3783 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3783 > Project: Accumulo > Issue Type: Bug > Components: master, start, tserver >Affects Versions: 1.7.0 >Reporter: Josh Elser >Assignee: Josh Elser > Fix For: 1.7.1, 1.8.0 > > Attachments: ACCUMULO-3783.patch > > > Noticed this in testing 1.7.0 on my laptop tonight. Started two tservers, one > continuous ingest client and would kill/restart one of the tservers > occasionally. > {noformat} > Failed to close map file > java.io.IOException: Filesystem closed > at > org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:795) > at > org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:629) > at java.io.FilterInputStream.close(FilterInputStream.java:181) > at > org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.close(CachableBlockFile.java:409) > at > org.apache.accumulo.core.file.rfile.RFile$Reader.close(RFile.java:921) > at > org.apache.accumulo.tserver.tablet.Compactor.compactLocalityGroup(Compactor.java:391) > at > org.apache.accumulo.tserver.tablet.Compactor.call(Compactor.java:214) > at > org.apache.accumulo.tserver.tablet.Tablet._majorCompact(Tablet.java:1981) > at > org.apache.accumulo.tserver.tablet.Tablet.majorCompact(Tablet.java:2098) > at > org.apache.accumulo.tserver.tablet.CompactionRunner.run(CompactionRunner.java:44) > at > org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > null > java.nio.channels.ClosedChannelException > at > org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1622) > at > org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:104) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > at > org.apache.accumulo.core.file.rfile.bcfile.SimpleBufferedOutputStream.flushBuffer(SimpleBufferedOutputStream.java:39) > at > org.apache.accumulo.core.file.rfile.bcfile.SimpleBufferedOutputStream.flush(SimpleBufferedOutputStream.java:68) > at > org.apache.hadoop.io.compress.CompressionOutputStream.flush(CompressionOutputStream.java:69) > at > org.apache.accumulo.core.file.rfile.bcfile.Compression$FinishOnFlushCompressionStream.flush(Compression.java:66) > at > java.io.BufferedOutputStream.flush(BufferedOutputStream.java:141) > at > org.apache.accumulo.core.file.rfile.bcfile.BCFile$Writer$WBlockState.finish(BCFile.java:233) > at > org.apache.accumulo.core.file.rfile.bcfile.BCFile$Writer$BlockAppender.close(BCFile.java:320) > at > org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$BlockWrite.close(CachableBlockFile.java:121) > at > org.apache.accumulo.core.file.rfile.RFile$Writer.closeBlock(RFile.java:398) > at > org.apache.accumulo.core.file.rfile.RFile$Writer.append(RFile.java:382) > at > org.apache.accumulo.tserver.tablet.Compactor.compactLocalityGroup(Compactor.java:356) > at > org.apache.accumulo.tserver.tablet.Compactor.call(Compactor.java:214) > at > org.apache.accumulo.tserver.tablet.Tablet._majorCompact(Tablet.java:1981) > at
[jira] [Commented] (ACCUMULO-3783) Unexpected Filesystem Closed exceptions
[ https://issues.apache.org/jira/browse/ACCUMULO-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14603254#comment-14603254 ] Dave Marion commented on ACCUMULO-3783: --- [~elserj] Take a look at the example config[1]. Basically, set general.classpaths and general.vfs.classpaths. You need '/.*.jar' if you only want to pick up jar files in that location. Short on time right now, but I can help solve issues if you have them over the weekend or later tonight. FYI, I have only deployed this with 1.6. Not sure if changes are necessary for 1.7 or 1.8, but happy to help. [1] https://github.com/apache/accumulo/blob/8c1d2d0c147220ca375006a8a7e7e481241651a7/assemble/conf/examples/vfs-classloader/accumulo-site.xml > Unexpected Filesystem Closed exceptions > --- > > Key: ACCUMULO-3783 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3783 > Project: Accumulo > Issue Type: Bug > Components: master, start, tserver >Affects Versions: 1.7.0 >Reporter: Josh Elser >Assignee: Josh Elser > Fix For: 1.7.1, 1.8.0 > > Attachments: ACCUMULO-3783.patch > > > Noticed this in testing 1.7.0 on my laptop tonight. Started two tservers, one > continuous ingest client and would kill/restart one of the tservers > occasionally. > {noformat} > Failed to close map file > java.io.IOException: Filesystem closed > at > org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:795) > at > org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:629) > at java.io.FilterInputStream.close(FilterInputStream.java:181) > at > org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.close(CachableBlockFile.java:409) > at > org.apache.accumulo.core.file.rfile.RFile$Reader.close(RFile.java:921) > at > org.apache.accumulo.tserver.tablet.Compactor.compactLocalityGroup(Compactor.java:391) > at > org.apache.accumulo.tserver.tablet.Compactor.call(Compactor.java:214) > at > org.apache.accumulo.tserver.tablet.Tablet._majorCompact(Tablet.java:1981) > at > org.apache.accumulo.tserver.tablet.Tablet.majorCompact(Tablet.java:2098) > at > org.apache.accumulo.tserver.tablet.CompactionRunner.run(CompactionRunner.java:44) > at > org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > null > java.nio.channels.ClosedChannelException > at > org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1622) > at > org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:104) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > at > org.apache.accumulo.core.file.rfile.bcfile.SimpleBufferedOutputStream.flushBuffer(SimpleBufferedOutputStream.java:39) > at > org.apache.accumulo.core.file.rfile.bcfile.SimpleBufferedOutputStream.flush(SimpleBufferedOutputStream.java:68) > at > org.apache.hadoop.io.compress.CompressionOutputStream.flush(CompressionOutputStream.java:69) > at > org.apache.accumulo.core.file.rfile.bcfile.Compression$FinishOnFlushCompressionStream.flush(Compression.java:66) > at > java.io.BufferedOutputStream.flush(BufferedOutputStream.java:141) > at > org.apache.accumulo.core.file.rfile.bcfile.BCFile$Writer$WBlockState.finish(BCFile.java:233) > at > org.apache.accumulo.core.file.rfile.bcfile.BCFile$Writer$BlockAppender.close(BCFile.java:320) > at > org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$BlockWrite.close(CachableBlockFile.java:121) > at > org.apache.accumulo.core.file.rfile.RFile$Writer.closeBlock(RFile.java:398) > at > org.apache.accumulo.core.file.rfile.RFile$Writer.append(RFile.java:382) > at > org.apache.accumulo.tserver.tablet.Compactor.compactLocalityGroup(Compactor.java:356) > at > org.apache.accumulo.tserver.tablet.Compactor.call(Compactor.java:214) > at > org.apache.accumulo.tserver.tablet.Tablet._majorCompact(Tablet.java:1981) > at > org.apache.accumulo.tserver.tablet.Tablet.majorCompact(Tablet.java:2098) > at > org.apache.accumulo.ts
[jira] [Commented] (ACCUMULO-3783) Unexpected Filesystem Closed exceptions
[ https://issues.apache.org/jira/browse/ACCUMULO-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14603585#comment-14603585 ] Dave Marion commented on ACCUMULO-3783: --- [~elserj] I think you need accumulo-start and the commons-vfs jar local, on each node. Take a look at the bootstrap_hdfs.sh script, it sets it up for you. I was trying to get to the point of versioning accumulo-start seperately from the rest of Accumulo so that you could just drop in a different version of Accumulo in HDFS and run with it. You would likely need to restart the TServers and other components though as they would not pick up the changes in the parent classloader. Having said that, you can also keep all of the Accumulo jars local to each node and only put your application jars in HDFS. Not sure if you followed along during the development of this, but you can also have per-table classpaths which is pretty cool. You define a named context, let's call it 'foo'. Then you configure your tables to use the 'foo' classpath context. 'foo' points to some directory in HDFS which has all of your application jars. Now, lets say that you want to test a new version of your iterators at scale. Define 'foo2', put the new jars in HDFS, clone your tables, change your table classpath contexts for the clone 'foo2' and now you can test at scale without taking up any space (assuming you are not ingesting, it would be really nice to disable compactions on a clone). Let me know if you need any help... > Unexpected Filesystem Closed exceptions > --- > > Key: ACCUMULO-3783 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3783 > Project: Accumulo > Issue Type: Bug > Components: master, start, tserver >Affects Versions: 1.7.0 >Reporter: Josh Elser >Assignee: Josh Elser > Fix For: 1.7.1, 1.8.0 > > Attachments: ACCUMULO-3783.patch > > > Noticed this in testing 1.7.0 on my laptop tonight. Started two tservers, one > continuous ingest client and would kill/restart one of the tservers > occasionally. > {noformat} > Failed to close map file > java.io.IOException: Filesystem closed > at > org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:795) > at > org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:629) > at java.io.FilterInputStream.close(FilterInputStream.java:181) > at > org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.close(CachableBlockFile.java:409) > at > org.apache.accumulo.core.file.rfile.RFile$Reader.close(RFile.java:921) > at > org.apache.accumulo.tserver.tablet.Compactor.compactLocalityGroup(Compactor.java:391) > at > org.apache.accumulo.tserver.tablet.Compactor.call(Compactor.java:214) > at > org.apache.accumulo.tserver.tablet.Tablet._majorCompact(Tablet.java:1981) > at > org.apache.accumulo.tserver.tablet.Tablet.majorCompact(Tablet.java:2098) > at > org.apache.accumulo.tserver.tablet.CompactionRunner.run(CompactionRunner.java:44) > at > org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > null > java.nio.channels.ClosedChannelException > at > org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1622) > at > org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:104) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > at > org.apache.accumulo.core.file.rfile.bcfile.SimpleBufferedOutputStream.flushBuffer(SimpleBufferedOutputStream.java:39) > at > org.apache.accumulo.core.file.rfile.bcfile.SimpleBufferedOutputStream.flush(SimpleBufferedOutputStream.java:68) > at > org.apache.hadoop.io.compress.CompressionOutputStream.flush(CompressionOutputStream.java:69) > at > org.apache.accumulo.core.file.rfile.bcfile.Compression$FinishOnFlushCompressionStream.flush(Compression.java:66) > at > java.io.BufferedOutputStream.flush(BufferedOutputStream.java:141) > at > org.apache.accumulo.core.file.rfile.bcfile.BCFile$Writer$WBlockState.finish(BCFile.java:233) > at > org.apache.accumulo.core.file.rfile.bcfile.BCFile$Writer$BlockAppender
[jira] [Commented] (ACCUMULO-3783) Unexpected Filesystem Closed exceptions
[ https://issues.apache.org/jira/browse/ACCUMULO-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14603621#comment-14603621 ] Dave Marion commented on ACCUMULO-3783: --- Yes, VFS-487 is fixed also. I just wish they would release it. To be fair, VFS 2.0 does work and I have been using it in production for quite a while. Hopefully 2.1 will remove some of the annoyances. > Unexpected Filesystem Closed exceptions > --- > > Key: ACCUMULO-3783 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3783 > Project: Accumulo > Issue Type: Bug > Components: master, start, tserver >Affects Versions: 1.7.0 >Reporter: Josh Elser >Assignee: Josh Elser > Fix For: 1.7.1, 1.8.0 > > Attachments: ACCUMULO-3783.patch > > > Noticed this in testing 1.7.0 on my laptop tonight. Started two tservers, one > continuous ingest client and would kill/restart one of the tservers > occasionally. > {noformat} > Failed to close map file > java.io.IOException: Filesystem closed > at > org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:795) > at > org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:629) > at java.io.FilterInputStream.close(FilterInputStream.java:181) > at > org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.close(CachableBlockFile.java:409) > at > org.apache.accumulo.core.file.rfile.RFile$Reader.close(RFile.java:921) > at > org.apache.accumulo.tserver.tablet.Compactor.compactLocalityGroup(Compactor.java:391) > at > org.apache.accumulo.tserver.tablet.Compactor.call(Compactor.java:214) > at > org.apache.accumulo.tserver.tablet.Tablet._majorCompact(Tablet.java:1981) > at > org.apache.accumulo.tserver.tablet.Tablet.majorCompact(Tablet.java:2098) > at > org.apache.accumulo.tserver.tablet.CompactionRunner.run(CompactionRunner.java:44) > at > org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > null > java.nio.channels.ClosedChannelException > at > org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1622) > at > org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:104) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > at > org.apache.accumulo.core.file.rfile.bcfile.SimpleBufferedOutputStream.flushBuffer(SimpleBufferedOutputStream.java:39) > at > org.apache.accumulo.core.file.rfile.bcfile.SimpleBufferedOutputStream.flush(SimpleBufferedOutputStream.java:68) > at > org.apache.hadoop.io.compress.CompressionOutputStream.flush(CompressionOutputStream.java:69) > at > org.apache.accumulo.core.file.rfile.bcfile.Compression$FinishOnFlushCompressionStream.flush(Compression.java:66) > at > java.io.BufferedOutputStream.flush(BufferedOutputStream.java:141) > at > org.apache.accumulo.core.file.rfile.bcfile.BCFile$Writer$WBlockState.finish(BCFile.java:233) > at > org.apache.accumulo.core.file.rfile.bcfile.BCFile$Writer$BlockAppender.close(BCFile.java:320) > at > org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$BlockWrite.close(CachableBlockFile.java:121) > at > org.apache.accumulo.core.file.rfile.RFile$Writer.closeBlock(RFile.java:398) > at > org.apache.accumulo.core.file.rfile.RFile$Writer.append(RFile.java:382) > at > org.apache.accumulo.tserver.tablet.Compactor.compactLocalityGroup(Compactor.java:356) > at > org.apache.accumulo.tserver.tablet.Compactor.call(Compactor.java:214) > at > org.apache.accumulo.tserver.tablet.Tablet._majorCompact(Tablet.java:1981) > at > org.apache.accumulo.tserver.tablet.Tablet.majorCompact(Tablet.java:2098) > at > org.apache.accumulo.tserver.tablet.CompactionRunner.run(CompactionRunner.java:44) > at > org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja
[jira] [Comment Edited] (ACCUMULO-3783) Unexpected Filesystem Closed exceptions
[ https://issues.apache.org/jira/browse/ACCUMULO-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14603621#comment-14603621 ] Dave Marion edited comment on ACCUMULO-3783 at 6/26/15 9:29 PM: Yes, VFS-487 is fixed also. I just wish they would release it. To be fair, VFS 2.0 does work and I have been using it in production for quite a while. Hopefully 2.1 will remove some of the annoyances. Edit: VFS-487 should resolve ACCUMULO-1507 was (Author: dlmarion): Yes, VFS-487 is fixed also. I just wish they would release it. To be fair, VFS 2.0 does work and I have been using it in production for quite a while. Hopefully 2.1 will remove some of the annoyances. > Unexpected Filesystem Closed exceptions > --- > > Key: ACCUMULO-3783 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3783 > Project: Accumulo > Issue Type: Bug > Components: master, start, tserver >Affects Versions: 1.7.0 >Reporter: Josh Elser >Assignee: Josh Elser > Fix For: 1.7.1, 1.8.0 > > Attachments: ACCUMULO-3783.patch > > > Noticed this in testing 1.7.0 on my laptop tonight. Started two tservers, one > continuous ingest client and would kill/restart one of the tservers > occasionally. > {noformat} > Failed to close map file > java.io.IOException: Filesystem closed > at > org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:795) > at > org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:629) > at java.io.FilterInputStream.close(FilterInputStream.java:181) > at > org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.close(CachableBlockFile.java:409) > at > org.apache.accumulo.core.file.rfile.RFile$Reader.close(RFile.java:921) > at > org.apache.accumulo.tserver.tablet.Compactor.compactLocalityGroup(Compactor.java:391) > at > org.apache.accumulo.tserver.tablet.Compactor.call(Compactor.java:214) > at > org.apache.accumulo.tserver.tablet.Tablet._majorCompact(Tablet.java:1981) > at > org.apache.accumulo.tserver.tablet.Tablet.majorCompact(Tablet.java:2098) > at > org.apache.accumulo.tserver.tablet.CompactionRunner.run(CompactionRunner.java:44) > at > org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > null > java.nio.channels.ClosedChannelException > at > org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1622) > at > org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:104) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > at > org.apache.accumulo.core.file.rfile.bcfile.SimpleBufferedOutputStream.flushBuffer(SimpleBufferedOutputStream.java:39) > at > org.apache.accumulo.core.file.rfile.bcfile.SimpleBufferedOutputStream.flush(SimpleBufferedOutputStream.java:68) > at > org.apache.hadoop.io.compress.CompressionOutputStream.flush(CompressionOutputStream.java:69) > at > org.apache.accumulo.core.file.rfile.bcfile.Compression$FinishOnFlushCompressionStream.flush(Compression.java:66) > at > java.io.BufferedOutputStream.flush(BufferedOutputStream.java:141) > at > org.apache.accumulo.core.file.rfile.bcfile.BCFile$Writer$WBlockState.finish(BCFile.java:233) > at > org.apache.accumulo.core.file.rfile.bcfile.BCFile$Writer$BlockAppender.close(BCFile.java:320) > at > org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$BlockWrite.close(CachableBlockFile.java:121) > at > org.apache.accumulo.core.file.rfile.RFile$Writer.closeBlock(RFile.java:398) > at > org.apache.accumulo.core.file.rfile.RFile$Writer.append(RFile.java:382) > at > org.apache.accumulo.tserver.tablet.Compactor.compactLocalityGroup(Compactor.java:356) > at > org.apache.accumulo.tserver.tablet.Compactor.call(Compactor.java:214) > at > org.apache.accumulo.tserver.tablet.Tablet._majorCompact(Tablet.java:1981) > at > org.apache.accumulo.tserver.tablet.Tablet.majorCompact(Tablet.java:2098) > at > org.apache.accumulo.tserver.tablet.Compaction
[jira] [Commented] (ACCUMULO-3783) Unexpected Filesystem Closed exceptions
[ https://issues.apache.org/jira/browse/ACCUMULO-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14603626#comment-14603626 ] Dave Marion commented on ACCUMULO-3783: --- That could be. We are using it with 1.6. > Unexpected Filesystem Closed exceptions > --- > > Key: ACCUMULO-3783 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3783 > Project: Accumulo > Issue Type: Bug > Components: master, start, tserver >Affects Versions: 1.7.0 >Reporter: Josh Elser >Assignee: Josh Elser > Fix For: 1.7.1, 1.8.0 > > Attachments: ACCUMULO-3783.patch > > > Noticed this in testing 1.7.0 on my laptop tonight. Started two tservers, one > continuous ingest client and would kill/restart one of the tservers > occasionally. > {noformat} > Failed to close map file > java.io.IOException: Filesystem closed > at > org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:795) > at > org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:629) > at java.io.FilterInputStream.close(FilterInputStream.java:181) > at > org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.close(CachableBlockFile.java:409) > at > org.apache.accumulo.core.file.rfile.RFile$Reader.close(RFile.java:921) > at > org.apache.accumulo.tserver.tablet.Compactor.compactLocalityGroup(Compactor.java:391) > at > org.apache.accumulo.tserver.tablet.Compactor.call(Compactor.java:214) > at > org.apache.accumulo.tserver.tablet.Tablet._majorCompact(Tablet.java:1981) > at > org.apache.accumulo.tserver.tablet.Tablet.majorCompact(Tablet.java:2098) > at > org.apache.accumulo.tserver.tablet.CompactionRunner.run(CompactionRunner.java:44) > at > org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > null > java.nio.channels.ClosedChannelException > at > org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1622) > at > org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:104) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > at > org.apache.accumulo.core.file.rfile.bcfile.SimpleBufferedOutputStream.flushBuffer(SimpleBufferedOutputStream.java:39) > at > org.apache.accumulo.core.file.rfile.bcfile.SimpleBufferedOutputStream.flush(SimpleBufferedOutputStream.java:68) > at > org.apache.hadoop.io.compress.CompressionOutputStream.flush(CompressionOutputStream.java:69) > at > org.apache.accumulo.core.file.rfile.bcfile.Compression$FinishOnFlushCompressionStream.flush(Compression.java:66) > at > java.io.BufferedOutputStream.flush(BufferedOutputStream.java:141) > at > org.apache.accumulo.core.file.rfile.bcfile.BCFile$Writer$WBlockState.finish(BCFile.java:233) > at > org.apache.accumulo.core.file.rfile.bcfile.BCFile$Writer$BlockAppender.close(BCFile.java:320) > at > org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$BlockWrite.close(CachableBlockFile.java:121) > at > org.apache.accumulo.core.file.rfile.RFile$Writer.closeBlock(RFile.java:398) > at > org.apache.accumulo.core.file.rfile.RFile$Writer.append(RFile.java:382) > at > org.apache.accumulo.tserver.tablet.Compactor.compactLocalityGroup(Compactor.java:356) > at > org.apache.accumulo.tserver.tablet.Compactor.call(Compactor.java:214) > at > org.apache.accumulo.tserver.tablet.Tablet._majorCompact(Tablet.java:1981) > at > org.apache.accumulo.tserver.tablet.Tablet.majorCompact(Tablet.java:2098) > at > org.apache.accumulo.tserver.tablet.CompactionRunner.run(CompactionRunner.java:44) > at > org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745)
[jira] [Commented] (ACCUMULO-3783) Unexpected Filesystem Closed exceptions
[ https://issues.apache.org/jira/browse/ACCUMULO-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14603633#comment-14603633 ] Dave Marion commented on ACCUMULO-3783: --- [~elserj] I just noticed VFS-570. Might be worth a look to make sure nothing gets broken > Unexpected Filesystem Closed exceptions > --- > > Key: ACCUMULO-3783 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3783 > Project: Accumulo > Issue Type: Bug > Components: master, start, tserver >Affects Versions: 1.7.0 >Reporter: Josh Elser >Assignee: Josh Elser > Fix For: 1.7.1, 1.8.0 > > Attachments: ACCUMULO-3783.patch > > > Noticed this in testing 1.7.0 on my laptop tonight. Started two tservers, one > continuous ingest client and would kill/restart one of the tservers > occasionally. > {noformat} > Failed to close map file > java.io.IOException: Filesystem closed > at > org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:795) > at > org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:629) > at java.io.FilterInputStream.close(FilterInputStream.java:181) > at > org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.close(CachableBlockFile.java:409) > at > org.apache.accumulo.core.file.rfile.RFile$Reader.close(RFile.java:921) > at > org.apache.accumulo.tserver.tablet.Compactor.compactLocalityGroup(Compactor.java:391) > at > org.apache.accumulo.tserver.tablet.Compactor.call(Compactor.java:214) > at > org.apache.accumulo.tserver.tablet.Tablet._majorCompact(Tablet.java:1981) > at > org.apache.accumulo.tserver.tablet.Tablet.majorCompact(Tablet.java:2098) > at > org.apache.accumulo.tserver.tablet.CompactionRunner.run(CompactionRunner.java:44) > at > org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > null > java.nio.channels.ClosedChannelException > at > org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1622) > at > org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:104) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > at > org.apache.accumulo.core.file.rfile.bcfile.SimpleBufferedOutputStream.flushBuffer(SimpleBufferedOutputStream.java:39) > at > org.apache.accumulo.core.file.rfile.bcfile.SimpleBufferedOutputStream.flush(SimpleBufferedOutputStream.java:68) > at > org.apache.hadoop.io.compress.CompressionOutputStream.flush(CompressionOutputStream.java:69) > at > org.apache.accumulo.core.file.rfile.bcfile.Compression$FinishOnFlushCompressionStream.flush(Compression.java:66) > at > java.io.BufferedOutputStream.flush(BufferedOutputStream.java:141) > at > org.apache.accumulo.core.file.rfile.bcfile.BCFile$Writer$WBlockState.finish(BCFile.java:233) > at > org.apache.accumulo.core.file.rfile.bcfile.BCFile$Writer$BlockAppender.close(BCFile.java:320) > at > org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$BlockWrite.close(CachableBlockFile.java:121) > at > org.apache.accumulo.core.file.rfile.RFile$Writer.closeBlock(RFile.java:398) > at > org.apache.accumulo.core.file.rfile.RFile$Writer.append(RFile.java:382) > at > org.apache.accumulo.tserver.tablet.Compactor.compactLocalityGroup(Compactor.java:356) > at > org.apache.accumulo.tserver.tablet.Compactor.call(Compactor.java:214) > at > org.apache.accumulo.tserver.tablet.Tablet._majorCompact(Tablet.java:1981) > at > org.apache.accumulo.tserver.tablet.Tablet.majorCompact(Tablet.java:2098) > at > org.apache.accumulo.tserver.tablet.CompactionRunner.run(CompactionRunner.java:44) > at > org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) >
[jira] [Commented] (ACCUMULO-3783) Unexpected Filesystem Closed exceptions
[ https://issues.apache.org/jira/browse/ACCUMULO-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14604934#comment-14604934 ] Dave Marion commented on ACCUMULO-3783: --- I think that if you don't use the HDFS FileSystem cache, then you will have to call FileSystem.close() on the FileSystem object that is referenced from the HdfsFileObject. You should be able to do this by overriding AbstractFileObject.finalize(). > Unexpected Filesystem Closed exceptions > --- > > Key: ACCUMULO-3783 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3783 > Project: Accumulo > Issue Type: Bug > Components: master, start, tserver >Affects Versions: 1.7.0 >Reporter: Josh Elser >Assignee: Josh Elser > Fix For: 1.7.1, 1.8.0 > > Attachments: ACCUMULO-3783.patch > > > Noticed this in testing 1.7.0 on my laptop tonight. Started two tservers, one > continuous ingest client and would kill/restart one of the tservers > occasionally. > {noformat} > Failed to close map file > java.io.IOException: Filesystem closed > at > org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:795) > at > org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:629) > at java.io.FilterInputStream.close(FilterInputStream.java:181) > at > org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.close(CachableBlockFile.java:409) > at > org.apache.accumulo.core.file.rfile.RFile$Reader.close(RFile.java:921) > at > org.apache.accumulo.tserver.tablet.Compactor.compactLocalityGroup(Compactor.java:391) > at > org.apache.accumulo.tserver.tablet.Compactor.call(Compactor.java:214) > at > org.apache.accumulo.tserver.tablet.Tablet._majorCompact(Tablet.java:1981) > at > org.apache.accumulo.tserver.tablet.Tablet.majorCompact(Tablet.java:2098) > at > org.apache.accumulo.tserver.tablet.CompactionRunner.run(CompactionRunner.java:44) > at > org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > null > java.nio.channels.ClosedChannelException > at > org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1622) > at > org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:104) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > at > org.apache.accumulo.core.file.rfile.bcfile.SimpleBufferedOutputStream.flushBuffer(SimpleBufferedOutputStream.java:39) > at > org.apache.accumulo.core.file.rfile.bcfile.SimpleBufferedOutputStream.flush(SimpleBufferedOutputStream.java:68) > at > org.apache.hadoop.io.compress.CompressionOutputStream.flush(CompressionOutputStream.java:69) > at > org.apache.accumulo.core.file.rfile.bcfile.Compression$FinishOnFlushCompressionStream.flush(Compression.java:66) > at > java.io.BufferedOutputStream.flush(BufferedOutputStream.java:141) > at > org.apache.accumulo.core.file.rfile.bcfile.BCFile$Writer$WBlockState.finish(BCFile.java:233) > at > org.apache.accumulo.core.file.rfile.bcfile.BCFile$Writer$BlockAppender.close(BCFile.java:320) > at > org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$BlockWrite.close(CachableBlockFile.java:121) > at > org.apache.accumulo.core.file.rfile.RFile$Writer.closeBlock(RFile.java:398) > at > org.apache.accumulo.core.file.rfile.RFile$Writer.append(RFile.java:382) > at > org.apache.accumulo.tserver.tablet.Compactor.compactLocalityGroup(Compactor.java:356) > at > org.apache.accumulo.tserver.tablet.Compactor.call(Compactor.java:214) > at > org.apache.accumulo.tserver.tablet.Tablet._majorCompact(Tablet.java:1981) > at > org.apache.accumulo.tserver.tablet.Tablet.majorCompact(Tablet.java:2098) > at > org.apache.accumulo.tserver.tablet.CompactionRunner.run(CompactionRunner.java:44) > at > org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.Thre
[jira] [Commented] (ACCUMULO-3783) Unexpected Filesystem Closed exceptions
[ https://issues.apache.org/jira/browse/ACCUMULO-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14608643#comment-14608643 ] Dave Marion commented on ACCUMULO-3783: --- Did you happen to have more than one jar in HDFS in the classpath? If so, did you update more than one of them at the same time? > Unexpected Filesystem Closed exceptions > --- > > Key: ACCUMULO-3783 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3783 > Project: Accumulo > Issue Type: Bug > Components: master, start, tserver >Affects Versions: 1.7.0 >Reporter: Josh Elser >Assignee: Josh Elser > Fix For: 1.7.1, 1.8.0 > > Attachments: ACCUMULO-3783.patch > > > Noticed this in testing 1.7.0 on my laptop tonight. Started two tservers, one > continuous ingest client and would kill/restart one of the tservers > occasionally. > {noformat} > Failed to close map file > java.io.IOException: Filesystem closed > at > org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:795) > at > org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:629) > at java.io.FilterInputStream.close(FilterInputStream.java:181) > at > org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.close(CachableBlockFile.java:409) > at > org.apache.accumulo.core.file.rfile.RFile$Reader.close(RFile.java:921) > at > org.apache.accumulo.tserver.tablet.Compactor.compactLocalityGroup(Compactor.java:391) > at > org.apache.accumulo.tserver.tablet.Compactor.call(Compactor.java:214) > at > org.apache.accumulo.tserver.tablet.Tablet._majorCompact(Tablet.java:1981) > at > org.apache.accumulo.tserver.tablet.Tablet.majorCompact(Tablet.java:2098) > at > org.apache.accumulo.tserver.tablet.CompactionRunner.run(CompactionRunner.java:44) > at > org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > null > java.nio.channels.ClosedChannelException > at > org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1622) > at > org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:104) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > at > org.apache.accumulo.core.file.rfile.bcfile.SimpleBufferedOutputStream.flushBuffer(SimpleBufferedOutputStream.java:39) > at > org.apache.accumulo.core.file.rfile.bcfile.SimpleBufferedOutputStream.flush(SimpleBufferedOutputStream.java:68) > at > org.apache.hadoop.io.compress.CompressionOutputStream.flush(CompressionOutputStream.java:69) > at > org.apache.accumulo.core.file.rfile.bcfile.Compression$FinishOnFlushCompressionStream.flush(Compression.java:66) > at > java.io.BufferedOutputStream.flush(BufferedOutputStream.java:141) > at > org.apache.accumulo.core.file.rfile.bcfile.BCFile$Writer$WBlockState.finish(BCFile.java:233) > at > org.apache.accumulo.core.file.rfile.bcfile.BCFile$Writer$BlockAppender.close(BCFile.java:320) > at > org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$BlockWrite.close(CachableBlockFile.java:121) > at > org.apache.accumulo.core.file.rfile.RFile$Writer.closeBlock(RFile.java:398) > at > org.apache.accumulo.core.file.rfile.RFile$Writer.append(RFile.java:382) > at > org.apache.accumulo.tserver.tablet.Compactor.compactLocalityGroup(Compactor.java:356) > at > org.apache.accumulo.tserver.tablet.Compactor.call(Compactor.java:214) > at > org.apache.accumulo.tserver.tablet.Tablet._majorCompact(Tablet.java:1981) > at > org.apache.accumulo.tserver.tablet.Tablet.majorCompact(Tablet.java:2098) > at > org.apache.accumulo.tserver.tablet.CompactionRunner.run(CompactionRunner.java:44) > at > org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at > org.apache.accumulo.fate.util.LoggingRunnab
[jira] [Commented] (ACCUMULO-3783) Unexpected Filesystem Closed exceptions
[ https://issues.apache.org/jira/browse/ACCUMULO-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14608879#comment-14608879 ] Dave Marion commented on ACCUMULO-3783: --- bq. IIRC, there were no other places in our code that we actually called close on the FileSystem when the TServer shuts down: TabletServer.run() -> VolumeManagerImpl.close() -> VolumeImpl.getFileSystem().close(); > Unexpected Filesystem Closed exceptions > --- > > Key: ACCUMULO-3783 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3783 > Project: Accumulo > Issue Type: Bug > Components: master, start, tserver >Affects Versions: 1.7.0 >Reporter: Josh Elser >Assignee: Josh Elser > Fix For: 1.7.1, 1.8.0 > > Attachments: ACCUMULO-3783.patch > > > Noticed this in testing 1.7.0 on my laptop tonight. Started two tservers, one > continuous ingest client and would kill/restart one of the tservers > occasionally. > {noformat} > Failed to close map file > java.io.IOException: Filesystem closed > at > org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:795) > at > org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:629) > at java.io.FilterInputStream.close(FilterInputStream.java:181) > at > org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.close(CachableBlockFile.java:409) > at > org.apache.accumulo.core.file.rfile.RFile$Reader.close(RFile.java:921) > at > org.apache.accumulo.tserver.tablet.Compactor.compactLocalityGroup(Compactor.java:391) > at > org.apache.accumulo.tserver.tablet.Compactor.call(Compactor.java:214) > at > org.apache.accumulo.tserver.tablet.Tablet._majorCompact(Tablet.java:1981) > at > org.apache.accumulo.tserver.tablet.Tablet.majorCompact(Tablet.java:2098) > at > org.apache.accumulo.tserver.tablet.CompactionRunner.run(CompactionRunner.java:44) > at > org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > null > java.nio.channels.ClosedChannelException > at > org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1622) > at > org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:104) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > at > org.apache.accumulo.core.file.rfile.bcfile.SimpleBufferedOutputStream.flushBuffer(SimpleBufferedOutputStream.java:39) > at > org.apache.accumulo.core.file.rfile.bcfile.SimpleBufferedOutputStream.flush(SimpleBufferedOutputStream.java:68) > at > org.apache.hadoop.io.compress.CompressionOutputStream.flush(CompressionOutputStream.java:69) > at > org.apache.accumulo.core.file.rfile.bcfile.Compression$FinishOnFlushCompressionStream.flush(Compression.java:66) > at > java.io.BufferedOutputStream.flush(BufferedOutputStream.java:141) > at > org.apache.accumulo.core.file.rfile.bcfile.BCFile$Writer$WBlockState.finish(BCFile.java:233) > at > org.apache.accumulo.core.file.rfile.bcfile.BCFile$Writer$BlockAppender.close(BCFile.java:320) > at > org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$BlockWrite.close(CachableBlockFile.java:121) > at > org.apache.accumulo.core.file.rfile.RFile$Writer.closeBlock(RFile.java:398) > at > org.apache.accumulo.core.file.rfile.RFile$Writer.append(RFile.java:382) > at > org.apache.accumulo.tserver.tablet.Compactor.compactLocalityGroup(Compactor.java:356) > at > org.apache.accumulo.tserver.tablet.Compactor.call(Compactor.java:214) > at > org.apache.accumulo.tserver.tablet.Tablet._majorCompact(Tablet.java:1981) > at > org.apache.accumulo.tserver.tablet.Tablet.majorCompact(Tablet.java:2098) > at > org.apache.accumulo.tserver.tablet.CompactionRunner.run(CompactionRunner.java:44) > at > org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(Thread
[jira] [Comment Edited] (ACCUMULO-3783) Unexpected Filesystem Closed exceptions
[ https://issues.apache.org/jira/browse/ACCUMULO-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14608879#comment-14608879 ] Dave Marion edited comment on ACCUMULO-3783 at 6/30/15 7:08 PM: bq. IIRC, there were no other places in our code that we actually called close on the FileSystem when the TServer shuts down: TabletServer.run() -> VolumeManagerImpl.close() -> VolumeImpl.getFileSystem().close(); also in TabletGroupWatcher was (Author: dlmarion): bq. IIRC, there were no other places in our code that we actually called close on the FileSystem when the TServer shuts down: TabletServer.run() -> VolumeManagerImpl.close() -> VolumeImpl.getFileSystem().close(); > Unexpected Filesystem Closed exceptions > --- > > Key: ACCUMULO-3783 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3783 > Project: Accumulo > Issue Type: Bug > Components: master, start, tserver >Affects Versions: 1.7.0 >Reporter: Josh Elser >Assignee: Josh Elser > Fix For: 1.7.1, 1.8.0 > > Attachments: ACCUMULO-3783.patch > > > Noticed this in testing 1.7.0 on my laptop tonight. Started two tservers, one > continuous ingest client and would kill/restart one of the tservers > occasionally. > {noformat} > Failed to close map file > java.io.IOException: Filesystem closed > at > org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:795) > at > org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:629) > at java.io.FilterInputStream.close(FilterInputStream.java:181) > at > org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.close(CachableBlockFile.java:409) > at > org.apache.accumulo.core.file.rfile.RFile$Reader.close(RFile.java:921) > at > org.apache.accumulo.tserver.tablet.Compactor.compactLocalityGroup(Compactor.java:391) > at > org.apache.accumulo.tserver.tablet.Compactor.call(Compactor.java:214) > at > org.apache.accumulo.tserver.tablet.Tablet._majorCompact(Tablet.java:1981) > at > org.apache.accumulo.tserver.tablet.Tablet.majorCompact(Tablet.java:2098) > at > org.apache.accumulo.tserver.tablet.CompactionRunner.run(CompactionRunner.java:44) > at > org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > null > java.nio.channels.ClosedChannelException > at > org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1622) > at > org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:104) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > at > org.apache.accumulo.core.file.rfile.bcfile.SimpleBufferedOutputStream.flushBuffer(SimpleBufferedOutputStream.java:39) > at > org.apache.accumulo.core.file.rfile.bcfile.SimpleBufferedOutputStream.flush(SimpleBufferedOutputStream.java:68) > at > org.apache.hadoop.io.compress.CompressionOutputStream.flush(CompressionOutputStream.java:69) > at > org.apache.accumulo.core.file.rfile.bcfile.Compression$FinishOnFlushCompressionStream.flush(Compression.java:66) > at > java.io.BufferedOutputStream.flush(BufferedOutputStream.java:141) > at > org.apache.accumulo.core.file.rfile.bcfile.BCFile$Writer$WBlockState.finish(BCFile.java:233) > at > org.apache.accumulo.core.file.rfile.bcfile.BCFile$Writer$BlockAppender.close(BCFile.java:320) > at > org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$BlockWrite.close(CachableBlockFile.java:121) > at > org.apache.accumulo.core.file.rfile.RFile$Writer.closeBlock(RFile.java:398) > at > org.apache.accumulo.core.file.rfile.RFile$Writer.append(RFile.java:382) > at > org.apache.accumulo.tserver.tablet.Compactor.compactLocalityGroup(Compactor.java:356) > at > org.apache.accumulo.tserver.tablet.Compactor.call(Compactor.java:214) > at > org.apache.accumulo.tserver.tablet.Tablet._majorCompact(Tablet.java:1981) > at > org.apache.accumulo.tserver.tablet.Tablet.majorCompact(Tablet.java:2098) > at > org.apache.accumulo.tserver.tab
[jira] [Commented] (ACCUMULO-3943) volumn definition agreement with default settings
[ https://issues.apache.org/jira/browse/ACCUMULO-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14643018#comment-14643018 ] Dave Marion commented on ACCUMULO-3943: --- Blockpool ID? > volumn definition agreement with default settings > - > > Key: ACCUMULO-3943 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3943 > Project: Accumulo > Issue Type: Bug > Components: gc, master, tserver >Reporter: Eric Newton >Priority: Minor > Fix For: 1.8.0 > > > I was helping a new user trying to use Accumulo. They managed to set up HDFS, > running on hdfs://localhost:8020. But they didn't set it up with specific > settings, and just used the default port. Accumulo worked initially, but > would not allow a bulk import. > During the bulk import process, the servers need to move the files into the > accumulo volumes, but keeping the volume the same. This makes the move > efficient, since nothing is copied between namespaces. In this case it > refused the import because it could not find the correct volume. > Accumulo needs to be more nuanced when comparing hdfs://localhost:8020, and > hdfs://localhost. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3943) volumn definition agreement with default settings
[ https://issues.apache.org/jira/browse/ACCUMULO-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14643019#comment-14643019 ] Dave Marion commented on ACCUMULO-3943: --- I don't think we should rely on fs.defaultFS. It's somewhat useless in a cluster of machines with multiple HDFS clusters. > volumn definition agreement with default settings > - > > Key: ACCUMULO-3943 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3943 > Project: Accumulo > Issue Type: Bug > Components: gc, master, tserver >Reporter: Eric Newton >Priority: Minor > Fix For: 1.8.0 > > > I was helping a new user trying to use Accumulo. They managed to set up HDFS, > running on hdfs://localhost:8020. But they didn't set it up with specific > settings, and just used the default port. Accumulo worked initially, but > would not allow a bulk import. > During the bulk import process, the servers need to move the files into the > accumulo volumes, but keeping the volume the same. This makes the move > efficient, since nothing is copied between namespaces. In this case it > refused the import because it could not find the correct volume. > Accumulo needs to be more nuanced when comparing hdfs://localhost:8020, and > hdfs://localhost. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-2115) class loader issues when replacing jar that is actively being used
[ https://issues.apache.org/jira/browse/ACCUMULO-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645957#comment-14645957 ] Dave Marion commented on ACCUMULO-2115: --- Should try this with commons-vfs 2.1 snapshot and see if that fixes the problem > class loader issues when replacing jar that is actively being used > -- > > Key: ACCUMULO-2115 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2115 > Project: Accumulo > Issue Type: Bug > Components: tserver >Affects Versions: 1.4.4 >Reporter: Ivan Bella >Priority: Minor > Labels: class_loader > > When replacing a jar in the lib/ext directory while active iterators are > using the previous jar will result is many class loader issues. New > iterators started up may also show class loader issues. This will probably > persist until the previously active iterators complete and release the > previous ClassLoader instance, but that is merely a guess. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (ACCUMULO-2911) setscaniter and setshelliter unable to load class.
[ https://issues.apache.org/jira/browse/ACCUMULO-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Marion resolved ACCUMULO-2911. --- Resolution: Cannot Reproduce Closing this due to lack of response. Can re-open if it's still an issue. > setscaniter and setshelliter unable to load class. > -- > > Key: ACCUMULO-2911 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2911 > Project: Accumulo > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: David Medinets >Priority: Trivial > > Problem: > I can use a custom iterator using the setiter command but the same iterator > does not work using the setscaniter or setshelliter commands. > References: > https://blogs.apache.org/accumulo/entry/the_accumulo_classloader > http://accumulo.apache.org/1.5/examples/classpath.html > Description: > I am using my https://github.com/medined/D4M_Schema project to start > Accumulo. So the environment that I am using can be duplicated exactly if > needed. I am using > Accumulo: 1.5.0 > Hadoop: 1.2.1 > The classpath settings in accumulo-site.xml are the following (which I think > are the default): > > general.classpaths > > $ACCUMULO_HOME/server/target/classes/, > $ACCUMULO_HOME/core/target/classes/, > $ACCUMULO_HOME/start/target/classes/, > $ACCUMULO_HOME/examples/target/classes/, > $ACCUMULO_HOME/lib/[^.].$ACCUMULO_VERSION.jar, > $ACCUMULO_HOME/lib/[^.].*.jar, > $ZOOKEEPER_HOME/zookeeper[^.].*.jar, > $HADOOP_HOME/conf, > $HADOOP_HOME/[^.].*.jar, > $HADOOP_HOME/lib/[^.].*.jar, > > Classpaths that accumulo checks for updates and class > files. > When using the Security Manager, please remove the > ".../target/classes/" values. > > > I can load my iterator using setiter but not with setscaniter or setshelliter. > Here is my do-nothing iterator: > public class MyIterator extends WrappingIterator implements OptionDescriber { > @Override > public IteratorOptions describeOptions() { > String name = "dummy"; > String description = "Dummy Description"; > Map namedOptions = new HashMap(); > List unnamedOptionDescriptions = null; > return new IteratorOptions(name, description, namedOptions, > unnamedOptionDescriptions); > } > @Override > public boolean validateOptions(Map options) { > return true; > } > > } > I copy the jar file out to HDFS: > hadoop fs -mkdir /user/vagrant/d4m/classpath > hadoop fs -put /vagrant/schema/target/d4m_schema-0.0.1-SNAPSHOT.jar > /user/vagrant/classpath > I set the table-specific classpath context: > createtable atest > table atest > insert row cf cq value > config -s > general.vfs.context.classpath.d4m=hdfs://affy-master:9000/user/vagrant/classpath > config -t atest -s table.classpath.context=d4m > Now I can configure the iterator and scan over the single row without a > problem: > setiter -n MyIterator -p 10 -scan -minc -majc -class > com.codebits.d4m.iterator.MyIterator > scan > deleteiter -n MyIterator -scan -minc -majc > However, the setscaniter commands fails: > root@instance atest> setscaniter -n MyIterator -p 10 -class > com.codebits.d4m.iterator.MyIterator > 2014-06-15 02:54:14,098 [shell.Shell] WARN : Deprecated, use setshelliter > Dummy Description > 2014-06-15 02:54:14,126 [shell.Shell] ERROR: > org.apache.accumulo.core.util.shell.ShellCommandException: Command could not > be initialized (Unable to load com.codebits.d4m.iterator.MyIterator) > As does the setshelliter: > root@instance atest> setshelliter -pn d4m -n MyIterator -p 10 -class > com.codebits.d4m.iterator.MyIterator > Dummy Description > 2014-06-15 02:55:07,025 [shell.Shell] ERROR: > org.apache.accumulo.core.util.shell.ShellCommandException: Command could not > be initialized (Unable to load com.codebits.d4m.iterator.MyIterator) > I don't see any messages in the log files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3923) VFS ClassLoader doesnt' work with KeywordExecutable
[ https://issues.apache.org/jira/browse/ACCUMULO-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645961#comment-14645961 ] Dave Marion commented on ACCUMULO-3923: --- My plan was to complete ACCUMULO-3470 once VFS 2.1 is released. I have to remove the hdfs vfs objects that are in Accumulo, bump the dependency in the pom, and fix imports. None of this is client facing, so we should be able to backport where appropriate and if people need it. > VFS ClassLoader doesnt' work with KeywordExecutable > --- > > Key: ACCUMULO-3923 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3923 > Project: Accumulo > Issue Type: Bug >Reporter: Josh Elser >Priority: Critical > Fix For: 1.7.1, 1.8.0 > > > Trying to make the VFS classloading stuff work and it doesn't seem like > ServiceLoader is finding any of the KeywordExecutable implementations. > Best I can tell after looking into this, VFSClassLoader (created by > AccumuloVFSClassLoader) has all of the jars listed as resources, but when > ServiceLoader tries to find the META-INF/services definitions, it returns > nothing, and thus we think the keyword must be a class name. Seems like a > commons-vfs bug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (ACCUMULO-2788) Classpath example missing regex
[ https://issues.apache.org/jira/browse/ACCUMULO-2788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Marion reassigned ACCUMULO-2788: - Assignee: Dave Marion > Classpath example missing regex > --- > > Key: ACCUMULO-2788 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2788 > Project: Accumulo > Issue Type: Bug > Components: docs >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Charles Simpson >Assignee: Dave Marion >Priority: Trivial > Time Spent: 10m > Remaining Estimate: 0h > > The classpath example at > [https://accumulo.apache.org/1.5/examples/classpath.html], > [https://accumulo.apache.org/1.6/examples/classpath.html], and in > README.classpath shows a per-table classpath being set to a directory then > having a jar loaded out of that directory into the classloader. > The example should use a regex (i.e. {{config -s > general.vfs.context.classpath.cx1=hdfs://: port>/user1/lib/\[^.\].*.jar}}) or give the complete filename of the jar in > HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (ACCUMULO-2788) Classpath example missing regex
[ https://issues.apache.org/jira/browse/ACCUMULO-2788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Marion resolved ACCUMULO-2788. --- Resolution: Fixed Fix Version/s: 1.8.0 > Classpath example missing regex > --- > > Key: ACCUMULO-2788 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2788 > Project: Accumulo > Issue Type: Bug > Components: docs >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Charles Simpson >Assignee: Dave Marion >Priority: Trivial > Fix For: 1.8.0 > > Time Spent: 10m > Remaining Estimate: 0h > > The classpath example at > [https://accumulo.apache.org/1.5/examples/classpath.html], > [https://accumulo.apache.org/1.6/examples/classpath.html], and in > README.classpath shows a per-table classpath being set to a directory then > having a jar loaded out of that directory into the classloader. > The example should use a regex (i.e. {{config -s > general.vfs.context.classpath.cx1=hdfs://: port>/user1/lib/\[^.\].*.jar}}) or give the complete filename of the jar in > HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ACCUMULO-3948) Enable A/B testing of scan iterators on a table
Dave Marion created ACCUMULO-3948: - Summary: Enable A/B testing of scan iterators on a table Key: ACCUMULO-3948 URL: https://issues.apache.org/jira/browse/ACCUMULO-3948 Project: Accumulo Issue Type: Improvement Components: tserver Reporter: Dave Marion Assignee: Dave Marion Classpath contexts are assigned to a table via the table configuration. You can test at scale by cloning your table and assigning a new classpath context to the cloned table. However, you would also need to change your application to use the new table names and since we cannot disable compactions you would start to consume more space in the filesystem for that table. We can support users passing in a context name to use for the scan on existing tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3948) Enable A/B testing of scan iterators on a table
[ https://issues.apache.org/jira/browse/ACCUMULO-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646018#comment-14646018 ] Dave Marion commented on ACCUMULO-3948: --- Another use of this feature would be to support rolling upgrades of a client application. Having two versions of an application running against the same table at the same time, using different versions of scan iterators. MinC and MajC iterators would still use the context name applied to the table via the table configuration. > Enable A/B testing of scan iterators on a table > --- > > Key: ACCUMULO-3948 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3948 > Project: Accumulo > Issue Type: Improvement > Components: tserver >Reporter: Dave Marion >Assignee: Dave Marion > > Classpath contexts are assigned to a table via the table configuration. You > can test at scale by cloning your table and assigning a new classpath context > to the cloned table. However, you would also need to change your application > to use the new table names and since we cannot disable compactions you would > start to consume more space in the filesystem for that table. We can support > users passing in a context name to use for the scan on existing tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ACCUMULO-3951) Provide some information about the Shell script command in the user manual
Dave Marion created ACCUMULO-3951: - Summary: Provide some information about the Shell script command in the user manual Key: ACCUMULO-3951 URL: https://issues.apache.org/jira/browse/ACCUMULO-3951 Project: Accumulo Issue Type: Task Components: docs Reporter: Dave Marion Assignee: Dave Marion Priority: Minor It was requested that I add some documentation about the script command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (ACCUMULO-3951) Provide some information about the Shell script command in the user manual
[ https://issues.apache.org/jira/browse/ACCUMULO-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Marion resolved ACCUMULO-3951. --- Resolution: Fixed Fix Version/s: 1.8.0 > Provide some information about the Shell script command in the user manual > -- > > Key: ACCUMULO-3951 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3951 > Project: Accumulo > Issue Type: Task > Components: docs >Reporter: Dave Marion >Assignee: Dave Marion >Priority: Minor > Fix For: 1.8.0 > > Time Spent: 10m > Remaining Estimate: 0h > > It was requested that I add some documentation about the script command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-1013) Integrate with a scalable monitoring tool
[ https://issues.apache.org/jira/browse/ACCUMULO-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14701103#comment-14701103 ] Dave Marion commented on ACCUMULO-1013: --- Personally I would like to see some API where I can provide an implementation to get the metrics off of each tserver and into some type of external monitoring tool. I'm thinking of something like StatsD. > Integrate with a scalable monitoring tool > - > > Key: ACCUMULO-1013 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1013 > Project: Accumulo > Issue Type: Sub-task > Components: monitor >Reporter: Eric Newton >Priority: Minor > Labels: gsoc2013, mentor > Fix For: 1.8.0 > > > The monitor is awesome. It should die. > I'm going to move other monitor tickets under this one (if I can), and create > some requirement tickets. > We would be better off putting our weight behind an existing monitoring > program which can scale, if one exists. > Hopefully we can combine tracing efforts and have a nicer distributed > trace-based tool, too. > For display functionality, lots of possibilities: Graphite, Cubism.js, D3.js > (really, any number of really slick Javascript graphing libraries). For log > collection, any number of distributed log management services out there too > can serve as inspiration for functionality: statsd, logstash, cacti/rrdtool. > Currently all of Accumulo monitoring information is exposed via JMX; a nice > balance could be found leveraging the existing monitoring capabilities with > JMXTrans (or equivalent) and applying a new GUI. > Familiarity with Java and JMX would be ideal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (ACCUMULO-1013) Integrate with a scalable monitoring tool
[ https://issues.apache.org/jira/browse/ACCUMULO-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14701103#comment-14701103 ] Dave Marion edited comment on ACCUMULO-1013 at 8/18/15 11:33 AM: - Personally I would like to see some API where I can provide an implementation to get the metrics off of each tserver and into some type of external monitoring tool. I'm thinking of something like the StatsD protocol. was (Author: dlmarion): Personally I would like to see some API where I can provide an implementation to get the metrics off of each tserver and into some type of external monitoring tool. I'm thinking of something like StatsD. > Integrate with a scalable monitoring tool > - > > Key: ACCUMULO-1013 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1013 > Project: Accumulo > Issue Type: Sub-task > Components: monitor >Reporter: Eric Newton >Priority: Minor > Labels: gsoc2013, mentor > Fix For: 1.8.0 > > > The monitor is awesome. It should die. > I'm going to move other monitor tickets under this one (if I can), and create > some requirement tickets. > We would be better off putting our weight behind an existing monitoring > program which can scale, if one exists. > Hopefully we can combine tracing efforts and have a nicer distributed > trace-based tool, too. > For display functionality, lots of possibilities: Graphite, Cubism.js, D3.js > (really, any number of really slick Javascript graphing libraries). For log > collection, any number of distributed log management services out there too > can serve as inspiration for functionality: statsd, logstash, cacti/rrdtool. > Currently all of Accumulo monitoring information is exposed via JMX; a nice > balance could be found leveraging the existing monitoring capabilities with > JMXTrans (or equivalent) and applying a new GUI. > Familiarity with Java and JMX would be ideal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-1013) Integrate with a scalable monitoring tool
[ https://issues.apache.org/jira/browse/ACCUMULO-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14701742#comment-14701742 ] Dave Marion commented on ACCUMULO-1013: --- Thanks. I did see the ticket for the Metrics2 integration, but have not looked yet at the implementation. Do you know if it is just exposing the metrics collected by the JMX MBeans, or a different set of metrics? > Integrate with a scalable monitoring tool > - > > Key: ACCUMULO-1013 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1013 > Project: Accumulo > Issue Type: Sub-task > Components: monitor >Reporter: Eric Newton >Priority: Minor > Labels: gsoc2013, mentor > Fix For: 1.8.0 > > > The monitor is awesome. It should die. > I'm going to move other monitor tickets under this one (if I can), and create > some requirement tickets. > We would be better off putting our weight behind an existing monitoring > program which can scale, if one exists. > Hopefully we can combine tracing efforts and have a nicer distributed > trace-based tool, too. > For display functionality, lots of possibilities: Graphite, Cubism.js, D3.js > (really, any number of really slick Javascript graphing libraries). For log > collection, any number of distributed log management services out there too > can serve as inspiration for functionality: statsd, logstash, cacti/rrdtool. > Currently all of Accumulo monitoring information is exposed via JMX; a nice > balance could be found leveraging the existing monitoring capabilities with > JMXTrans (or equivalent) and applying a new GUI. > Familiarity with Java and JMX would be ideal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-1013) Integrate with a scalable monitoring tool
[ https://issues.apache.org/jira/browse/ACCUMULO-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14701764#comment-14701764 ] Dave Marion commented on ACCUMULO-1013: --- Cool, thx. > Integrate with a scalable monitoring tool > - > > Key: ACCUMULO-1013 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1013 > Project: Accumulo > Issue Type: Sub-task > Components: monitor >Reporter: Eric Newton >Priority: Minor > Labels: gsoc2013, mentor > Fix For: 1.8.0 > > > The monitor is awesome. It should die. > I'm going to move other monitor tickets under this one (if I can), and create > some requirement tickets. > We would be better off putting our weight behind an existing monitoring > program which can scale, if one exists. > Hopefully we can combine tracing efforts and have a nicer distributed > trace-based tool, too. > For display functionality, lots of possibilities: Graphite, Cubism.js, D3.js > (really, any number of really slick Javascript graphing libraries). For log > collection, any number of distributed log management services out there too > can serve as inspiration for functionality: statsd, logstash, cacti/rrdtool. > Currently all of Accumulo monitoring information is exposed via JMX; a nice > balance could be found leveraging the existing monitoring capabilities with > JMXTrans (or equivalent) and applying a new GUI. > Familiarity with Java and JMX would be ideal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3975) Deadlock by recursive scans
[ https://issues.apache.org/jira/browse/ACCUMULO-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716781#comment-14716781 ] Dave Marion commented on ACCUMULO-3975: --- Is this really a bug? I think we have always told people not to do scans from an iterator. Subsequent comments suggest creating a design document for a new feature. Suggest closing this and opening a new ticket to work on the design. > Deadlock by recursive scans > --- > > Key: ACCUMULO-3975 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3975 > Project: Accumulo > Issue Type: Bug > Components: mini, tserver >Affects Versions: 1.7.0 >Reporter: Dylan Hutchison > Fix For: 1.8.0 > > > A tablet server has a fixed size thread pool that it uses for scanning. The > maximum number of threads is controlled by > {{tserver.readahead.concurrent.max}}, which defaults to 16. > Take the use case of opening a Scanner inside of a server-side iterator. The > following results in deadlock. > 1. A client creates a BatchScanner (call this A) with enough query threads > (say, 16) that it uses up all the readahead threads on a single tablet > server. > 2. Inside the scan on that unlucky tablet server, an iterator opens a Scanner > (call these B) to tablets on the same tablet server. > 3. The Scanner Bs inside the iterators block because there is no free > readahead thread on the target tablet server to serve the request. They never > unblock. Essentially the tserver scan threads block on trying to obtain > tserver scan threads from the same thread pool. > The tablet server does not seem to recover from this event even after the > client disconnects (e.g. by killing the client). Not all the internalRead > threads appear to die by IOException, which can prevent subsequent scans with > smaller numbers of tablets from succeeding. It does recover on restarting > the tablet server. > The tablet server has some mechanism to increase the thread pool size at > {{rpc.TServerUtils.createSelfResizingThreadPool}}. It seems to be > ineffective. I see log messages like these: > {noformat} > 2015-08-26 21:35:24,247 [rpc.TServerUtils] INFO : Increasing server thread > pool size on TabletServer to 33 > 2015-08-26 21:35:25,248 [rpc.TServerUtils] INFO : Increasing server thread > pool size on TabletServer to 33 > 2015-08-26 21:35:26,250 [rpc.TServerUtils] INFO : Increasing server thread > pool size on TabletServer to 33 > 2015-08-26 21:35:27,252 [rpc.TServerUtils] INFO : Increasing server thread > pool size on TabletServer to 33 > {noformat} > Also a bunch of these pop up, in case it helps > {noformat} > 2015-08-26 21:38:29,417 [tserver.TabletServer] DEBUG: ScanSess tid > 127.0.0.1:40168 !0 0 entries in 0.00 secs, nbTimes = [1 1 1.00 1] > 2015-08-26 21:38:34,428 [tserver.TabletServer] DEBUG: ScanSess tid > 127.0.0.1:40168 !0 0 entries in 0.00 secs, nbTimes = [0 0 0.00 1] > 2015-08-26 21:38:39,433 [tserver.TabletServer] DEBUG: ScanSess tid > 127.0.0.1:40168 !0 0 entries in 0.00 secs, nbTimes = [1 1 1.00 1] > 2015-08-26 21:38:44,266 [tserver.TabletServer] DEBUG: ScanSess tid > 127.0.0.1:38802 !0 0 entries in 0.00 secs, nbTimes = [2 2 2.00 1] > 2015-08-26 21:38:44,438 [tserver.TabletServer] DEBUG: ScanSess tid > 127.0.0.1:40168 !0 0 entries in 0.00 secs, nbTimes = [1 1 1.00 1] > 2015-08-26 21:38:48,022 [tserver.TabletServer] DEBUG: MultiScanSess > 127.0.0.1:38802 0 entries in 0.02 secs (lookup_time:0.02 secs tablets:1 > ranges:1) > 2015-08-26 21:38:48,034 [tserver.TabletServer] DEBUG: MultiScanSess > 127.0.0.1:38802 0 entries in 0.01 secs (lookup_time:0.01 secs tablets:1 > ranges:1) > 2015-08-26 21:38:49,452 [tserver.TabletServer] DEBUG: ScanSess tid > 127.0.0.1:40168 !0 0 entries in 0.00 secs, nbTimes = [1 1 1.00 1] > 2015-08-26 21:38:54,456 [tserver.TabletServer] DEBUG: ScanSess tid > 127.0.0.1:40168 !0 0 entries in 0.00 secs, nbTimes = [1 1 1.00 1] > 2015-08-26 21:38:59,473 [tserver.TabletServer] DEBUG: ScanSess tid > 127.0.0.1:40168 !0 0 entries in 0.00 secs, nbTimes = [1 1 1.00 1] > 2015-08-26 21:39:04,484 [tserver.TabletServer] DEBUG: ScanSess tid > 127.0.0.1:40168 !0 0 entries in 0.00 secs, nbTimes = [1 1 1.00 1] > {noformat} > I pushed a [test case that reproduces the deadlock in the Graphulo test > code|https://github.com/Accla/graphulo/blob/master/src/test/java/edu/mit/ll/graphulo/AccumuloBugTest.java#L47]. > It shows that when we use less threads than > {{tserver.readahead.concurrent.max}} (16), everything is okay, but if we use > more threads then deadlock occurs pretty reliably. > We can imagine a few kinds of solutions, such as fixing the self-increasing > thread pool mechanism that does not appear to work, or making re-entrant > thread pools. Let's find a simple solution. If I had my druth
[jira] [Commented] (ACCUMULO-4019) thrift proxy no longer listening on all interfaces
[ https://issues.apache.org/jira/browse/ACCUMULO-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14944257#comment-14944257 ] Dave Marion commented on ACCUMULO-4019: --- I think binding to all interfaces could be a security issue in some organizations. Can we make it configurable? Maybe default to all interfaces to fix the backwards compatibility issue? Then, file a ticket for a future release to change the default to localhost (forcing the user to change it)? > thrift proxy no longer listening on all interfaces > -- > > Key: ACCUMULO-4019 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4019 > Project: Accumulo > Issue Type: Bug > Components: proxy >Affects Versions: 1.7.0 >Reporter: Adam Fuchs > > In updating the thrift proxy to use HostAndPort-style configuration, we > changed the behavior from listening on all interfaces to only listening on > the canonical host name interface. This broke the proxy for some users: > {code} > -TServer server = createProxyServer(AccumuloProxy.class, > ProxyServer.class, port, protoFactoryClass, opts.prop); > -server.serve(); > +HostAndPort address = > HostAndPort.fromParts(InetAddress.getLocalHost().getCanonicalHostName(), > port); > +ServerAddress server = createProxyServer(address, protoFactory, > opts.prop); > {code} > Does anybody know what prompted this change? To fix this, I think we should > hardcode it to listen to all interfaces. Would the correct way of doing that > be to use the following address?: > {code} > HostAndPort address = HostAndPort.fromParts("::", port); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4022) Create a concept of multi-homed tablets
[ https://issues.apache.org/jira/browse/ACCUMULO-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948734#comment-14948734 ] Dave Marion commented on ACCUMULO-4022: --- * Need to consider how to load balance scans between the read-write and read-only tablets. * Can there be more than one read-only copy of a tablet? > Create a concept of multi-homed tablets > --- > > Key: ACCUMULO-4022 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4022 > Project: Accumulo > Issue Type: Wish > Components: client, tserver >Reporter: marco polo > Labels: newbie, performance > > I'm an accumulo newbie, but I wish to see the concept of multi-homed tablets. > This allows us to have tablets hosted by multiple servers, with only one > being writable against it. This concept would allow n receiver servers for a > tablet. An example might be a tablet that has become a hot spot could be > dynamically hosted elsewhere, and clients could pick this up as a potential. > Consistency must be kept between the hosts, as the initial read/write host > may compact or write to that tablet. > To me the larger problem may come from live ingest in which the write ahead > log has not been flushed. To avoid having to write to the read only servers > in a pipeline, we would likely need to create a model of enforcing reads only > after a flush of that tablet or a thrift interface to allow reading only the > data in memory to ensure consistency is enforced. I haven't given great > thought to solving this yet. > Please comment with ideas and pitfalls as I would like to see this wish come > to fruition with actionable tickets after some community thought. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ACCUMULO-3948) Enable A/B testing of scan iterators on a table
[ https://issues.apache.org/jira/browse/ACCUMULO-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Marion updated ACCUMULO-3948: -- Attachment: TestIterator.patch ACCUMULO-3948.1-6-3.patch I got this working with 1.6.3 (files attached for safe keeping), still need to port this patch to 1.8.0. > Enable A/B testing of scan iterators on a table > --- > > Key: ACCUMULO-3948 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3948 > Project: Accumulo > Issue Type: Improvement > Components: tserver >Reporter: Dave Marion >Assignee: Dave Marion > Fix For: 1.8.0 > > Attachments: ACCUMULO-3948.1-6-3.patch, TestIterator.patch > > > Classpath contexts are assigned to a table via the table configuration. You > can test at scale by cloning your table and assigning a new classpath context > to the cloned table. However, you would also need to change your application > to use the new table names and since we cannot disable compactions you would > start to consume more space in the filesystem for that table. We can support > users passing in a context name to use for the scan on existing tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ACCUMULO-4054) Add optional context name to TableOperations.testClassLoad
Dave Marion created ACCUMULO-4054: - Summary: Add optional context name to TableOperations.testClassLoad Key: ACCUMULO-4054 URL: https://issues.apache.org/jira/browse/ACCUMULO-4054 Project: Accumulo Issue Type: Task Affects Versions: 1.8.0 Reporter: Dave Marion Scanner.setContext() allows a user to set a context name that will override the context set on a table. We could fail fast at setup time if we could check the existence of the iterator in that context on the server side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ACCUMULO-4055) Add test in ConditionalWriterIT for setting context name on Scanner
Dave Marion created ACCUMULO-4055: - Summary: Add test in ConditionalWriterIT for setting context name on Scanner Key: ACCUMULO-4055 URL: https://issues.apache.org/jira/browse/ACCUMULO-4055 Project: Accumulo Issue Type: Task Components: test Affects Versions: 1.8.0 Reporter: Dave Marion Create a clone of ConditionalWriterIT.testIterators() that loads the iterators from a context not defined on a table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (ACCUMULO-3948) Enable A/B testing of scan iterators on a table
[ https://issues.apache.org/jira/browse/ACCUMULO-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Marion resolved ACCUMULO-3948. --- Resolution: Fixed > Enable A/B testing of scan iterators on a table > --- > > Key: ACCUMULO-3948 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3948 > Project: Accumulo > Issue Type: Improvement > Components: tserver >Reporter: Dave Marion >Assignee: Dave Marion > Fix For: 1.8.0 > > Attachments: ACCUMULO-3948.1-6-3.patch, TestIterator.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Classpath contexts are assigned to a table via the table configuration. You > can test at scale by cloning your table and assigning a new classpath context > to the cloned table. However, you would also need to change your application > to use the new table names and since we cannot disable compactions you would > start to consume more space in the filesystem for that table. We can support > users passing in a context name to use for the scan on existing tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4057) Duplicated code in IteratorUtil
[ https://issues.apache.org/jira/browse/ACCUMULO-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009574#comment-15009574 ] Dave Marion commented on ACCUMULO-4057: --- It's not technically a duplicate, it's overloaded. The 2nd method above has an extra parameter. > Duplicated code in IteratorUtil > --- > > Key: ACCUMULO-4057 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4057 > Project: Accumulo > Issue Type: Improvement >Reporter: Josh Elser >Priority: Trivial > > Duplicated code in > https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/iterators/IteratorUtil.java#L236 > {code} > public static ,V extends Writable> > SortedKeyValueIterator loadIterators(IteratorScope scope, > SortedKeyValueIterator source, KeyExtent extent, > AccumuloConfiguration conf, List ssiList, > Map> ssio, > IteratorEnvironment env, boolean useAccumuloClassLoader) throws > IOException { > List iters = new ArrayList(ssiList); > Map> allOptions = new > HashMap>(); > parseIteratorConfiguration(scope, iters, ssio, allOptions, conf); > return loadIterators(source, iters, allOptions, env, > useAccumuloClassLoader, conf.get(Property.TABLE_CLASSPATH)); > } > public static ,V extends Writable> > SortedKeyValueIterator loadIterators(IteratorScope scope, > SortedKeyValueIterator source, KeyExtent extent, > AccumuloConfiguration conf, List ssiList, > Map> ssio, > IteratorEnvironment env, boolean useAccumuloClassLoader, String > classLoaderContext) throws IOException { > List iters = new ArrayList(ssiList); > Map> allOptions = new > HashMap>(); > parseIteratorConfiguration(scope, iters, ssio, allOptions, conf); > return loadIterators(source, iters, allOptions, env, > useAccumuloClassLoader, classLoaderContext); > } > {code} > I thought I had commented on https://github.com/apache/accumulo/pull/51 about > this, but maybe I forgot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ACCUMULO-4062) Change MutationSet.mutations to use HashSet
Dave Marion created ACCUMULO-4062: - Summary: Change MutationSet.mutations to use HashSet Key: ACCUMULO-4062 URL: https://issues.apache.org/jira/browse/ACCUMULO-4062 Project: Accumulo Issue Type: Improvement Components: client Reporter: Dave Marion Change TabletServerBatchWriter.MutationSet.mutations from a {code} HashMap> {code} to {code} HashMap> {code} so that duplication mutations added by a client are not sent to the server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4062) Change MutationSet.mutations to use HashSet
[ https://issues.apache.org/jira/browse/ACCUMULO-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014213#comment-15014213 ] Dave Marion commented on ACCUMULO-4062: --- Is there a reason not to do this? What am I missing? > Change MutationSet.mutations to use HashSet > --- > > Key: ACCUMULO-4062 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4062 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Dave Marion > > Change TabletServerBatchWriter.MutationSet.mutations from a > {code} > HashMap> > {code} > to > {code} > HashMap> > {code} > so that duplication mutations added by a client are not sent to the server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ACCUMULO-4062) Change MutationSet.mutations to use HashSet
[ https://issues.apache.org/jira/browse/ACCUMULO-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Marion updated ACCUMULO-4062: -- Description: Change TabletServerBatchWriter.MutationSet.mutations from a {code} HashMap> {code} to {code} HashMap> {code} so that duplicate mutations added by a client are not sent to the server. was: Change TabletServerBatchWriter.MutationSet.mutations from a {code} HashMap> {code} to {code} HashMap> {code} so that duplication mutations added by a client are not sent to the server. > Change MutationSet.mutations to use HashSet > --- > > Key: ACCUMULO-4062 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4062 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Dave Marion > > Change TabletServerBatchWriter.MutationSet.mutations from a > {code} > HashMap> > {code} > to > {code} > HashMap> > {code} > so that duplicate mutations added by a client are not sent to the server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4062) Change MutationSet.mutations to use HashSet
[ https://issues.apache.org/jira/browse/ACCUMULO-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014259#comment-15014259 ] Dave Marion commented on ACCUMULO-4062: --- {code}org.apache.accumulo.core.data.Mutation#hashCode{code} in master looks like this for me: {code} @Override public int hashCode() { return serializedSnapshot().hashCode(); } {code} what code are you looking at? > Change MutationSet.mutations to use HashSet > --- > > Key: ACCUMULO-4062 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4062 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Dave Marion > > Change TabletServerBatchWriter.MutationSet.mutations from a > {code} > HashMap> > {code} > to > {code} > HashMap> > {code} > so that duplicate mutations added by a client are not sent to the server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4062) Change MutationSet.mutations to use HashSet
[ https://issues.apache.org/jira/browse/ACCUMULO-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014276#comment-15014276 ] Dave Marion commented on ACCUMULO-4062: --- Looks like it was done 4 weeks ago in the 1.6 branch and merged up. > Change MutationSet.mutations to use HashSet > --- > > Key: ACCUMULO-4062 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4062 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Dave Marion > > Change TabletServerBatchWriter.MutationSet.mutations from a > {code} > HashMap> > {code} > to > {code} > HashMap> > {code} > so that duplicate mutations added by a client are not sent to the server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4062) Change MutationSet.mutations to use HashSet
[ https://issues.apache.org/jira/browse/ACCUMULO-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014307#comment-15014307 ] Dave Marion commented on ACCUMULO-4062: --- I think your comment about different ordering of updates might be valid > Change MutationSet.mutations to use HashSet > --- > > Key: ACCUMULO-4062 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4062 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Dave Marion > > Change TabletServerBatchWriter.MutationSet.mutations from a > {code} > HashMap> > {code} > to > {code} > HashMap> > {code} > so that duplicate mutations added by a client are not sent to the server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4062) Change MutationSet.mutations to use HashSet
[ https://issues.apache.org/jira/browse/ACCUMULO-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014337#comment-15014337 ] Dave Marion commented on ACCUMULO-4062: --- I don't think it renders this optimization useless though, it just makes it less effective in certain cases. > Change MutationSet.mutations to use HashSet > --- > > Key: ACCUMULO-4062 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4062 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Dave Marion > > Change TabletServerBatchWriter.MutationSet.mutations from a > {code} > HashMap> > {code} > to > {code} > HashMap> > {code} > so that duplicate mutations added by a client are not sent to the server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4062) Change MutationSet.mutations to use HashSet
[ https://issues.apache.org/jira/browse/ACCUMULO-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014466#comment-15014466 ] Dave Marion commented on ACCUMULO-4062: --- I dont think that guarantee ever existed. A client writes K/V into an object, batch writer in this case, they control the buffer size, write threads, and latency. We guarantee order on retrieval, not insert, to my knowledge. > Change MutationSet.mutations to use HashSet > --- > > Key: ACCUMULO-4062 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4062 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Dave Marion > > Change TabletServerBatchWriter.MutationSet.mutations from a > {code} > HashMap> > {code} > to > {code} > HashMap> > {code} > so that duplicate mutations added by a client are not sent to the server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4062) Change MutationSet.mutations to use HashSet
[ https://issues.apache.org/jira/browse/ACCUMULO-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014550#comment-15014550 ] Dave Marion commented on ACCUMULO-4062: --- So two rows with the exact same key components and timestamp, but with different values, and the last one in wins? That may work with a single client, but not across multiple clients. Im not sure that is implied. Im going to have to look at some more code. > Change MutationSet.mutations to use HashSet > --- > > Key: ACCUMULO-4062 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4062 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Dave Marion > > Change TabletServerBatchWriter.MutationSet.mutations from a > {code} > HashMap> > {code} > to > {code} > HashMap> > {code} > so that duplicate mutations added by a client are not sent to the server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4062) Change MutationSet.mutations to use HashSet
[ https://issues.apache.org/jira/browse/ACCUMULO-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15018153#comment-15018153 ] Dave Marion commented on ACCUMULO-4062: --- Looking into this further - I looked at the server side code {noformat} TabletServer.flush() -> CommitSession.commit() -> Tabet.commit() -> TabletMemory.mutate() -> CommitSession.mutate() -> InMemoryMap.mutate() {noformat} at this point it calls one of the SimpleMap.mutate() implementations passing a list of mutations and a counter which gets incremented each time the SimpleMap.mutate() method is called. Looking at DefaultMap.mutate(), it creates a MemKey and add its to a map that uses the MemKeyComparator. The MemKeyComparator uses the counter if the two keys are identical. Having said all of that, the order of the mutations does appear to be preserved as you indicate. However, this would only hold true if there is one client writing in that key space. If more than one client were writing in that key space, then I think the tablet server would apply them as they were received. Maybe some clients are counting on this behavior, but I don't think this behavior has been explicitly stated as being guaranteed. I don't want to break any client that are counting on this working, but I would like to see if there is a way to dedupe on the client side. > Change MutationSet.mutations to use HashSet > --- > > Key: ACCUMULO-4062 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4062 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Dave Marion > > Change TabletServerBatchWriter.MutationSet.mutations from a > {code} > HashMap> > {code} > to > {code} > HashMap> > {code} > so that duplicate mutations added by a client are not sent to the server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4062) Change MutationSet.mutations to use HashSet
[ https://issues.apache.org/jira/browse/ACCUMULO-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15018446#comment-15018446 ] Dave Marion commented on ACCUMULO-4062: --- That's a good point. I assume that you mean when the versioning iterator has been removed? > Change MutationSet.mutations to use HashSet > --- > > Key: ACCUMULO-4062 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4062 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Dave Marion > > Change TabletServerBatchWriter.MutationSet.mutations from a > {code} > HashMap> > {code} > to > {code} > HashMap> > {code} > so that duplicate mutations added by a client are not sent to the server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (ACCUMULO-4062) Change MutationSet.mutations to use HashSet
[ https://issues.apache.org/jira/browse/ACCUMULO-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Marion resolved ACCUMULO-4062. --- Resolution: Won't Fix Based on this discussion clients should deduplicate mutations and the order in which updates are applied from the client to the server are not explicitly guaranteed. > Change MutationSet.mutations to use HashSet > --- > > Key: ACCUMULO-4062 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4062 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Dave Marion > > Change TabletServerBatchWriter.MutationSet.mutations from a > {code} > HashMap> > {code} > to > {code} > HashMap> > {code} > so that duplicate mutations added by a client are not sent to the server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4079) ThriftTransportPool closes cached connections too aggressively
[ https://issues.apache.org/jira/browse/ACCUMULO-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061288#comment-15061288 ] Dave Marion commented on ACCUMULO-4079: --- See ACCUMULO-2069. Almost two years to the day :-). I don't think we did anything, but I will double check when I get back in to the office to see if we changed something internally. > ThriftTransportPool closes cached connections too aggressively > -- > > Key: ACCUMULO-4079 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4079 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Eric Newton > > 3 seconds is a little fast. > [~dlmarion] do you know what timeout value you've been using in production > systems? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4090) BatchWriter close not cleaning up all resources
[ https://issues.apache.org/jira/browse/ACCUMULO-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069780#comment-15069780 ] Dave Marion commented on ACCUMULO-4090: --- Looking at a heap dump I consistently see two objects in the queue for the jtimer object, a FailedMutations object and an anonymous timer task. I believe the following should be done: 1. When TSBW.close() is called, then FailedMutations.cancel() should be called. 2. A reference should be kept to the TimerTask added to jtimer in the TSBW constructor. Then in TSBW.close() the cancel() method should be called on this task. Looking at the TabletServerBatchWriter objects in the heap dump I see that the closed field is always false. I wonder if the root cause is that this field is not marked as volatile (and the flushing field may be an issue too). > BatchWriter close not cleaning up all resources > --- > > Key: ACCUMULO-4090 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4090 > Project: Accumulo > Issue Type: Bug > Components: client >Affects Versions: 1.7.0 >Reporter: Eric Newton >Assignee: Eric Newton > > I'm debugging an issue with a long-running ingestor, similar to the > TraceServer. > After realizing that BatchWriter close needs to be called when a > MutationsRejectedException occurs (see ACCUMULO-4088), a close was added, and > the client became more stable. > However, after a day, or so, the client became sluggish. When inspecting a > heap dump, many TabletServerBatchWriter objects were still referenced. This > server should only have two BatchWriter instances at any one time, and this > server had >100. > Still debugging. > The error that initiates the issue is a SessionID not found, presumably > because the session timed out. This is the cause of the > MutationsRejectedException seen by the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations
[ https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163437#comment-15163437 ] Dave Marion commented on ACCUMULO-1755: --- We could solve this by: 1. Making MutationSet.mutations a ConcurrentHashMap 2. Not synchronizing on access to TabletServerBatchWriter.mutations 3. Changing TabletServerBatchWriter.mutations to an AtomicReference so that it is safe to swap it out in startProcessing() 4. In startProcessing(), swap in a new MutationSet then add the mutations from the previous MutationSet to the writer. > BatchWriter blocks all addMutation calls while binning mutations > > > Key: ACCUMULO-1755 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1755 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Adam Fuchs > > Through code inspection, we found that the BatchWriter bins mutations inside > of a synchronized block that covers calls to addMutation. Binning potentially > involves lookups of tablet metadata and processes a fair amount of > information. We will get better parallelism if we can either unlock the lock > while binning, dedicate another thread to do the binning, or use one of the > send threads to do the binning. > This has not been verified empirically yet, so there is not yet any profiling > info to indicate the level of improvement that we should expect. Profiling > and repeatable demonstration of this performance bottleneck should be the > first step on this ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations
[ https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163437#comment-15163437 ] Dave Marion edited comment on ACCUMULO-1755 at 2/24/16 6:07 PM: We could solve this by: 1. Making MutationSet.mutations a ConcurrentHashMap 2. Making MutationSet.memoryUsed an AtomicLong 3. Not synchronizing on access to TabletServerBatchWriter.mutations 4. Changing TabletServerBatchWriter.mutations to an AtomicReference so that it is safe to swap it out in startProcessing() 5. In startProcessing(), swap in a new MutationSet then add the mutations from the previous MutationSet to the writer. was (Author: dlmarion): We could solve this by: 1. Making MutationSet.mutations a ConcurrentHashMap 2. Not synchronizing on access to TabletServerBatchWriter.mutations 3. Changing TabletServerBatchWriter.mutations to an AtomicReference so that it is safe to swap it out in startProcessing() 4. In startProcessing(), swap in a new MutationSet then add the mutations from the previous MutationSet to the writer. > BatchWriter blocks all addMutation calls while binning mutations > > > Key: ACCUMULO-1755 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1755 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Adam Fuchs > > Through code inspection, we found that the BatchWriter bins mutations inside > of a synchronized block that covers calls to addMutation. Binning potentially > involves lookups of tablet metadata and processes a fair amount of > information. We will get better parallelism if we can either unlock the lock > while binning, dedicate another thread to do the binning, or use one of the > send threads to do the binning. > This has not been verified empirically yet, so there is not yet any profiling > info to indicate the level of improvement that we should expect. Profiling > and repeatable demonstration of this performance bottleneck should be the > first step on this ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations
[ https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163460#comment-15163460 ] Dave Marion commented on ACCUMULO-1755: --- Maybe something for 2.0? I wasn't looking to do an entire rewrite, just remove some of the locking. > BatchWriter blocks all addMutation calls while binning mutations > > > Key: ACCUMULO-1755 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1755 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Adam Fuchs > > Through code inspection, we found that the BatchWriter bins mutations inside > of a synchronized block that covers calls to addMutation. Binning potentially > involves lookups of tablet metadata and processes a fair amount of > information. We will get better parallelism if we can either unlock the lock > while binning, dedicate another thread to do the binning, or use one of the > send threads to do the binning. > This has not been verified empirically yet, so there is not yet any profiling > info to indicate the level of improvement that we should expect. Profiling > and repeatable demonstration of this performance bottleneck should be the > first step on this ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations
[ https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163513#comment-15163513 ] Dave Marion commented on ACCUMULO-1755: --- https://reviews.apache.org/r/43957/ > BatchWriter blocks all addMutation calls while binning mutations > > > Key: ACCUMULO-1755 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1755 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Adam Fuchs >Assignee: Dave Marion > > Through code inspection, we found that the BatchWriter bins mutations inside > of a synchronized block that covers calls to addMutation. Binning potentially > involves lookups of tablet metadata and processes a fair amount of > information. We will get better parallelism if we can either unlock the lock > while binning, dedicate another thread to do the binning, or use one of the > send threads to do the binning. > This has not been verified empirically yet, so there is not yet any profiling > info to indicate the level of improvement that we should expect. Profiling > and repeatable demonstration of this performance bottleneck should be the > first step on this ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations
[ https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163587#comment-15163587 ] Dave Marion commented on ACCUMULO-1755: --- Adam, I have this issue now where i have N clients sharing a batch writer. As you noted in the description, all the threads wait on binning mutations. I could use a batch writer per thread, and that may be the solution in the end. I think I can remove the synchronized modifier from addMutation, but I think in the end I may just be pushing the problem to an area of the code that the client has no control over. I'm interested in solving this issue though, any time you can spare would be appreciated. > BatchWriter blocks all addMutation calls while binning mutations > > > Key: ACCUMULO-1755 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1755 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Adam Fuchs >Assignee: Dave Marion > > Through code inspection, we found that the BatchWriter bins mutations inside > of a synchronized block that covers calls to addMutation. Binning potentially > involves lookups of tablet metadata and processes a fair amount of > information. We will get better parallelism if we can either unlock the lock > while binning, dedicate another thread to do the binning, or use one of the > send threads to do the binning. > This has not been verified empirically yet, so there is not yet any profiling > info to indicate the level of improvement that we should expect. Profiling > and repeatable demonstration of this performance bottleneck should be the > first step on this ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations
[ https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163732#comment-15163732 ] Dave Marion commented on ACCUMULO-1755: --- No performance numbers, just seeing BLOCKED threads in a stack trace :-). I'll see what I can do about getting some performance numbers with and without my final patch. Do you think continuous ingest would be a good framework for this? > BatchWriter blocks all addMutation calls while binning mutations > > > Key: ACCUMULO-1755 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1755 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Adam Fuchs >Assignee: Dave Marion > > Through code inspection, we found that the BatchWriter bins mutations inside > of a synchronized block that covers calls to addMutation. Binning potentially > involves lookups of tablet metadata and processes a fair amount of > information. We will get better parallelism if we can either unlock the lock > while binning, dedicate another thread to do the binning, or use one of the > send threads to do the binning. > This has not been verified empirically yet, so there is not yet any profiling > info to indicate the level of improvement that we should expect. Profiling > and repeatable demonstration of this performance bottleneck should be the > first step on this ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations
[ https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Marion updated ACCUMULO-1755: -- Fix Version/s: 1.7.2 1.6.6 > BatchWriter blocks all addMutation calls while binning mutations > > > Key: ACCUMULO-1755 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1755 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Adam Fuchs >Assignee: Dave Marion > Fix For: 1.6.6, 1.7.2, 1.8.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Through code inspection, we found that the BatchWriter bins mutations inside > of a synchronized block that covers calls to addMutation. Binning potentially > involves lookups of tablet metadata and processes a fair amount of > information. We will get better parallelism if we can either unlock the lock > while binning, dedicate another thread to do the binning, or use one of the > send threads to do the binning. > This has not been verified empirically yet, so there is not yet any profiling > info to indicate the level of improvement that we should expect. Profiling > and repeatable demonstration of this performance bottleneck should be the > first step on this ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations
[ https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Marion resolved ACCUMULO-1755. --- Resolution: Fixed Committed to 1.6 and merged up to master. Built with 'mvn clean verify -DskipITs' on each branch and ran the new IT seperately. > BatchWriter blocks all addMutation calls while binning mutations > > > Key: ACCUMULO-1755 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1755 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Adam Fuchs >Assignee: Dave Marion > Fix For: 1.6.6, 1.7.2, 1.8.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Through code inspection, we found that the BatchWriter bins mutations inside > of a synchronized block that covers calls to addMutation. Binning potentially > involves lookups of tablet metadata and processes a fair amount of > information. We will get better parallelism if we can either unlock the lock > while binning, dedicate another thread to do the binning, or use one of the > send threads to do the binning. > This has not been verified empirically yet, so there is not yet any profiling > info to indicate the level of improvement that we should expect. Profiling > and repeatable demonstration of this performance bottleneck should be the > first step on this ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations
[ https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Marion updated ACCUMULO-1755: -- Attachment: ACCUMULO-1755.patch Attaching original patch > BatchWriter blocks all addMutation calls while binning mutations > > > Key: ACCUMULO-1755 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1755 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Adam Fuchs >Assignee: Dave Marion > Fix For: 1.6.6, 1.7.2, 1.8.0 > > Attachments: ACCUMULO-1755.patch > > Time Spent: 2h > Remaining Estimate: 0h > > Through code inspection, we found that the BatchWriter bins mutations inside > of a synchronized block that covers calls to addMutation. Binning potentially > involves lookups of tablet metadata and processes a fair amount of > information. We will get better parallelism if we can either unlock the lock > while binning, dedicate another thread to do the binning, or use one of the > send threads to do the binning. > This has not been verified empirically yet, so there is not yet any profiling > info to indicate the level of improvement that we should expect. Profiling > and repeatable demonstration of this performance bottleneck should be the > first step on this ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations
[ https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176713#comment-15176713 ] Dave Marion commented on ACCUMULO-1755: --- I took the test that I created and ran it against master and my feature branch with 1 to 6 threads. I didn't see much difference, but looking back at it now I think its because the test pre-creates all of the mutations and adds them as fast as possible. The test is really for multi-threaded correctness rather than performance. In the new code there is still a synchronization point when adding the binned mutations to the queues for the tablet servers. The send threads in the test (local mini accumulo cluster) must be able to keep up with adding of the binned mutations. I don't expect that to be the case in a real deployment. Good news - performance wasn't worse. I think a better test is to write a simple multi-threaded client that creates and adds mutations to a common batch writer. Then, time the application as whole trying to insert N mutations with 1 to N client threads. The previous implementation blocked all client threads from calling BatchWriter.addMutation(), meaning the clients could not do any work. In the new implementation the clients will be able to continue to do work, adding mutations, and even binning them in their own thread if necessary, before blocking. I'll see if I can re-test with this new approach in the next few days. Do you have a different thought about how to test this? > BatchWriter blocks all addMutation calls while binning mutations > > > Key: ACCUMULO-1755 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1755 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Adam Fuchs >Assignee: Dave Marion > Fix For: 1.6.6, 1.7.2, 1.8.0 > > Attachments: ACCUMULO-1755.patch > > Time Spent: 2h > Remaining Estimate: 0h > > Through code inspection, we found that the BatchWriter bins mutations inside > of a synchronized block that covers calls to addMutation. Binning potentially > involves lookups of tablet metadata and processes a fair amount of > information. We will get better parallelism if we can either unlock the lock > while binning, dedicate another thread to do the binning, or use one of the > send threads to do the binning. > This has not been verified empirically yet, so there is not yet any profiling > info to indicate the level of improvement that we should expect. Profiling > and repeatable demonstration of this performance bottleneck should be the > first step on this ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations
[ https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15178398#comment-15178398 ] Dave Marion commented on ACCUMULO-1755: --- bq. The previous implementation blocked all client threads from calling BatchWriter.addMutation(), meaning the clients could not do any work. In the new implementation the clients will be able to continue to do work, adding mutations, and even binning them in their own thread if necessary, before blocking. My statement from above is incorrect. We didn't remove the synchronization from TabletServerBatchWriter.addMutation. We only made it such that the binning is done either in a background thread or the current thread. > BatchWriter blocks all addMutation calls while binning mutations > > > Key: ACCUMULO-1755 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1755 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Adam Fuchs >Assignee: Dave Marion > Fix For: 1.6.6, 1.7.2, 1.8.0 > > Attachments: ACCUMULO-1755.patch > > Time Spent: 2h > Remaining Estimate: 0h > > Through code inspection, we found that the BatchWriter bins mutations inside > of a synchronized block that covers calls to addMutation. Binning potentially > involves lookups of tablet metadata and processes a fair amount of > information. We will get better parallelism if we can either unlock the lock > while binning, dedicate another thread to do the binning, or use one of the > send threads to do the binning. > This has not been verified empirically yet, so there is not yet any profiling > info to indicate the level of improvement that we should expect. Profiling > and repeatable demonstration of this performance bottleneck should be the > first step on this ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations
[ https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180139#comment-15180139 ] Dave Marion commented on ACCUMULO-1755: --- [~afuchs] [~kturner] FWIW, I have been doing some testing locally. I have not been able to show any real performance improvement. Running an application with this patch still shows multiple client threads blocking on TabletServerBatchWriter.addMutation() because of the synchronization on that method. All this patch did was make 1 of the client threads execute that method faster. I think the real performance improvement will be removing the synchronization modifier from the addMutation method. > BatchWriter blocks all addMutation calls while binning mutations > > > Key: ACCUMULO-1755 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1755 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Adam Fuchs >Assignee: Dave Marion > Fix For: 1.6.6, 1.7.2, 1.8.0 > > Attachments: ACCUMULO-1755.patch > > Time Spent: 2h > Remaining Estimate: 0h > > Through code inspection, we found that the BatchWriter bins mutations inside > of a synchronized block that covers calls to addMutation. Binning potentially > involves lookups of tablet metadata and processes a fair amount of > information. We will get better parallelism if we can either unlock the lock > while binning, dedicate another thread to do the binning, or use one of the > send threads to do the binning. > This has not been verified empirically yet, so there is not yet any profiling > info to indicate the level of improvement that we should expect. Profiling > and repeatable demonstration of this performance bottleneck should be the > first step on this ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations
[ https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184994#comment-15184994 ] Dave Marion commented on ACCUMULO-1755: --- I wrote a new test that sends 1M mutations total using N threads with a BatchWriter buffer of different sizes. The test is run twice and the time discarded to account for JVM startup. Then the test is run 10 times and the average (in seconds) is reported for total time and time to add mutations. First, I added some code to the TSBW to determine that with my test data I was sending the following number of batches using 1, 10, and 100MB buffers: ||BatchWriter Max Memory Size || Flushes To Accumulo | 1MB | 515 | | 10MB | 52 | | 100MB | 6 | Here are the results of the test: h2. master branch Using the patch 1755-perf-test.patch Total Time ||Threads|| 1MB || 10MB || 100MB || | 1 | 3.121 | 2.818 | 3.158 | | 2 | 3.102 | 2.414 | 2.950 | | 4 | 3.367 | 2.573 | 3.114 | | 8 | 3.422 | 2.569 | 3.140 | | 16 | 3.590 | 2.741 | 3.332 | Add Mutation Time ||Threads|| 1MB || 10MB || 100MB || | 1 | 3.114 | 2.733 | 2.498 | | 2 | 3.088 | 2.350 | 2.371 | | 4 | 3.360 | 2.506 | 2.472 | | 8 | 3.414 | 2.516 | 2.509 | | 16 | 3.582 | 2.692 | 2.696 | h2. master branch with modifications to remove sync on addMutation() I successfully modified the TSBW to remove the synchronization modifier from the addMutation method. The multi-threaded binning test passes so I have some confidence that the data is correct. Use patch 1755-nosync-perf-test.patch Total Time ||Threads|| 1MB || 10MB || 100MB || | 1 | 3.080 | 2.766 | 3.255 | | 2 | 2.972 | 2.420 | 3.137 | | 4 | 3.162 | 2.492 | 3.190 | | 8 | 3.100 | 2.658 | 3.623 | | 16 | 3.393 | 2.898 | 3.743 | Add Mutation Time ||Threads|| 1MB || 10MB || 100MB || | 1 | 3.072 | 2.653 | 2.517 | | 2 | 2.965 | 2.371 | 2.527 | | 4 | 3.155 | 2.441 | 2.589 | | 8 | 3.092 | 2.602 | 2.961 | | 16 | 3.385 | 2.839 | 2.891 | I think the results are inconclusive. The tests run with MAC on localhost, so this is likely a best case scenario. I'd be interested to see this re-run on a real cluster. > BatchWriter blocks all addMutation calls while binning mutations > > > Key: ACCUMULO-1755 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1755 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Adam Fuchs >Assignee: Dave Marion > Fix For: 1.6.6, 1.7.2, 1.8.0 > > Attachments: ACCUMULO-1755.patch > > Time Spent: 2h > Remaining Estimate: 0h > > Through code inspection, we found that the BatchWriter bins mutations inside > of a synchronized block that covers calls to addMutation. Binning potentially > involves lookups of tablet metadata and processes a fair amount of > information. We will get better parallelism if we can either unlock the lock > while binning, dedicate another thread to do the binning, or use one of the > send threads to do the binning. > This has not been verified empirically yet, so there is not yet any profiling > info to indicate the level of improvement that we should expect. Profiling > and repeatable demonstration of this performance bottleneck should be the > first step on this ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations
[ https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Marion updated ACCUMULO-1755: -- Attachment: 1755-perf-test.patch 1755-nosync-perf-test.patch > BatchWriter blocks all addMutation calls while binning mutations > > > Key: ACCUMULO-1755 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1755 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Adam Fuchs >Assignee: Dave Marion > Fix For: 1.6.6, 1.7.2, 1.8.0 > > Attachments: 1755-nosync-perf-test.patch, 1755-perf-test.patch, > ACCUMULO-1755.patch > > Time Spent: 2h > Remaining Estimate: 0h > > Through code inspection, we found that the BatchWriter bins mutations inside > of a synchronized block that covers calls to addMutation. Binning potentially > involves lookups of tablet metadata and processes a fair amount of > information. We will get better parallelism if we can either unlock the lock > while binning, dedicate another thread to do the binning, or use one of the > send threads to do the binning. > This has not been verified empirically yet, so there is not yet any profiling > info to indicate the level of improvement that we should expect. Profiling > and repeatable demonstration of this performance bottleneck should be the > first step on this ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ACCUMULO-4169) TabletServer.config contextCleaner removes contexts that are not set on a table
Dave Marion created ACCUMULO-4169: - Summary: TabletServer.config contextCleaner removes contexts that are not set on a table Key: ACCUMULO-4169 URL: https://issues.apache.org/jira/browse/ACCUMULO-4169 Project: Accumulo Issue Type: Bug Components: tserver Affects Versions: 1.8.0 Reporter: Dave Marion -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ACCUMULO-4169) TabletServer.config contextCleaner removes contexts that are not set on a table
[ https://issues.apache.org/jira/browse/ACCUMULO-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Marion updated ACCUMULO-4169: -- Description: ACCUMULO-3948 added a feature where you could define a context in the Accumulo configuration, not set it on a table, and use it in a Scanner. However, there is a runnable created n TabletServer.config() that runs every 60 seconds that closes context that are not defined on a table. Suggesting that we have the context cleaner not close any context defined in the configuration. > TabletServer.config contextCleaner removes contexts that are not set on a > table > --- > > Key: ACCUMULO-4169 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4169 > Project: Accumulo > Issue Type: Bug > Components: tserver >Affects Versions: 1.8.0 >Reporter: Dave Marion > > ACCUMULO-3948 added a feature where you could define a context in the > Accumulo configuration, not set it on a table, and use it in a Scanner. > However, there is a runnable created n TabletServer.config() that runs every > 60 seconds that closes context that are not defined on a table. Suggesting > that we have the context cleaner not close any context defined in the > configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4169) TabletServer.config contextCleaner removes contexts that are not set on a table
[ https://issues.apache.org/jira/browse/ACCUMULO-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15208679#comment-15208679 ] Dave Marion commented on ACCUMULO-4169: --- "in use" for a context would, I think, mean the call to Context.getClassLoader() by IteratorUtil.loadIterators or something else. You are not suggesting that we track the use of the actual classloader that is returned from the context are you? > TabletServer.config contextCleaner removes contexts that are not set on a > table > --- > > Key: ACCUMULO-4169 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4169 > Project: Accumulo > Issue Type: Bug > Components: tserver >Affects Versions: 1.8.0 >Reporter: Dave Marion > > ACCUMULO-3948 added a feature where you could define a context in the > Accumulo configuration, not set it on a table, and use it in a Scanner. > However, there is a runnable created n TabletServer.config() that runs every > 60 seconds that closes context that are not defined on a table. Suggesting > that we have the context cleaner not close any context defined in the > configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4169) TabletServer.config contextCleaner removes contexts that are not set on a table
[ https://issues.apache.org/jira/browse/ACCUMULO-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15208870#comment-15208870 ] Dave Marion commented on ACCUMULO-4169: --- bq. Do we have an API now for creating the context? The context object is created under the covers and has no direct API. A user can define/undefine a context via the Accumulo configuration. > TabletServer.config contextCleaner removes contexts that are not set on a > table > --- > > Key: ACCUMULO-4169 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4169 > Project: Accumulo > Issue Type: Bug > Components: tserver >Affects Versions: 1.8.0 >Reporter: Dave Marion > > ACCUMULO-3948 added a feature where you could define a context in the > Accumulo configuration, not set it on a table, and use it in a Scanner. > However, there is a runnable created n TabletServer.config() that runs every > 60 seconds that closes context that are not defined on a table. Suggesting > that we have the context cleaner not close any context defined in the > configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4169) TabletServer.config contextCleaner removes contexts that are not set on a table
[ https://issues.apache.org/jira/browse/ACCUMULO-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15210599#comment-15210599 ] Dave Marion commented on ACCUMULO-4169: --- bq. I'm not sure what kind of protection we could provide to prevent clients in a multi-tenant environment from tanking the entire system. An administrator still has to define the context in the configuration. Regarding PermGen, ACCUMULO-599 highlighted some fixes, and I think the right solution, in Java 7 anyway, is to use -XX:+CMSClassUnloadingEnabled. > TabletServer.config contextCleaner removes contexts that are not set on a > table > --- > > Key: ACCUMULO-4169 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4169 > Project: Accumulo > Issue Type: Bug > Components: tserver >Affects Versions: 1.8.0 >Reporter: Dave Marion > > ACCUMULO-3948 added a feature where you could define a context in the > Accumulo configuration, not set it on a table, and use it in a Scanner. > However, there is a runnable created n TabletServer.config() that runs every > 60 seconds that closes context that are not defined on a table. Suggesting > that we have the context cleaner not close any context defined in the > configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ACCUMULO-4173) Balance table within a set of hosts
Dave Marion created ACCUMULO-4173: - Summary: Balance table within a set of hosts Key: ACCUMULO-4173 URL: https://issues.apache.org/jira/browse/ACCUMULO-4173 Project: Accumulo Issue Type: Bug Components: master Reporter: Dave Marion Assignee: Dave Marion Fix For: 1.8.0 Create a table balancer that will provide a set of hosts for the table tablet balancer to use. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4169) TabletServer.config contextCleaner removes contexts that are not set on a table
[ https://issues.apache.org/jira/browse/ACCUMULO-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216092#comment-15216092 ] Dave Marion commented on ACCUMULO-4169: --- bq. A possible work around on a running system is to add the context to an empty table to lock it in memory. This workaround only works if you create splits for the empty table and a split is hosted on each tserver. > TabletServer.config contextCleaner removes contexts that are not set on a > table > --- > > Key: ACCUMULO-4169 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4169 > Project: Accumulo > Issue Type: Bug > Components: tserver >Affects Versions: 1.8.0 >Reporter: Dave Marion > > ACCUMULO-3948 added a feature where you could define a context in the > Accumulo configuration, not set it on a table, and use it in a Scanner. > However, there is a runnable created n TabletServer.config() that runs every > 60 seconds that closes context that are not defined on a table. Suggesting > that we have the context cleaner not close any context defined in the > configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)