[jira] [Work started] (HBASE-28056) [HBoss] add support for AWS v2 SDK
[ https://issues.apache.org/jira/browse/HBASE-28056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-28056 started by Steve Loughran. -- > [HBoss] add support for AWS v2 SDK > -- > > Key: HBASE-28056 > URL: https://issues.apache.org/jira/browse/HBASE-28056 > Project: HBase > Issue Type: Bug > Components: hboss >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > > HBoss doesn't compile against a version of hadoop built with the AWS v2 SDK, > which HADOOP-18703 will do on hadoop trunk within a few days. > I think the solution here is probably some profile to build against different > sdk/hadoop versions -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-28056) [HBoss] add support for AWS v2 SDK
[ https://issues.apache.org/jira/browse/HBASE-28056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17765230#comment-17765230 ] Steve Loughran commented on HBASE-28056: FYI, i have got things compiling, but mock tests are failing as the api is being used differently. HADOOP-1 makes things slightly easier, but HADOOP-18877 should provide an easier plugin point to pub the stub s3fs behind, as all calls to s3 will be behind an internal interface: what you need to implement becomes a lot clearer. > [HBoss] add support for AWS v2 SDK > -- > > Key: HBASE-28056 > URL: https://issues.apache.org/jira/browse/HBASE-28056 > Project: HBase > Issue Type: Bug > Components: hboss >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > > HBoss doesn't compile against a version of hadoop built with the AWS v2 SDK, > which HADOOP-18703 will do on hadoop trunk within a few days. > I think the solution here is probably some profile to build against different > sdk/hadoop versions -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28056) [HBoss] add support for AWS v2 SDK
[ https://issues.apache.org/jira/browse/HBASE-28056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HBASE-28056: --- Description: HBoss doesn't compile against a version of hadoop built with the AWS v2 SDK, which HADOOP-18703 will do on hadoop trunk within a few days. I think the solution here is probably some profile to build against different sdk/hadoop versions was: HBoss doesn't compile against a version of hadoop built with the AWS v2 SDK, which HADOOP-18703 will do on hadoop trunk within a few days. > [HBoss] add support for AWS v2 SDK > -- > > Key: HBASE-28056 > URL: https://issues.apache.org/jira/browse/HBASE-28056 > Project: HBase > Issue Type: Bug > Components: hboss >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > > HBoss doesn't compile against a version of hadoop built with the AWS v2 SDK, > which HADOOP-18703 will do on hadoop trunk within a few days. > I think the solution here is probably some profile to build against different > sdk/hadoop versions -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-28056) [HBoss] add support for AWS v2 SDK
[ https://issues.apache.org/jira/browse/HBASE-28056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17760881#comment-17760881 ] Steve Loughran commented on HBASE-28056: {code} 20:40:56 2023-08-30 19:40:56 - ERROR-root::util|431:: [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.7.0:compile (default-compile) on project hadoop-testutils: Compilation failure: Compilation failure: 20:40:56 2023-08-30 19:40:56 - ERROR-root::util|431:: [ERROR] /.../hbase_filesystem/hadoop-testutils/src/main/java/org/apache/hadoop/hbase/oss/Hadoop33EmbeddedS3ClientFactory.java:[33,8] org.apache.hadoop.hbase.oss.Hadoop33EmbeddedS3ClientFactory is not abstract and does not override abstract method createS3TransferManager(software.amazon.awssdk.services.s3.S3AsyncClient) in org.apache.hadoop.fs.s3a.S3ClientFactory 20:40:56 2023-08-30 19:40:56 - ERROR-root::util|431:: [ERROR] /.../hbase_filesystem/hadoop-testutils/src/main/java/org/apache/hadoop/hbase/oss/Hadoop33EmbeddedS3ClientFactory.java:[56,19] createS3Client(java.net.URI,org.apache.hadoop.fs.s3a.S3ClientFactory.S3ClientCreationParameters) in org.apache.hadoop.hbase.oss.Hadoop33EmbeddedS3ClientFactory cannot implement createS3Client(java.net.URI,org.apache.hadoop.fs.s3a.S3ClientFactory.S3ClientCreationParameters) in org.apache.hadoop.fs.s3a.S3ClientFactory 20:40:56 2023-08-30 19:40:56 - ERROR-root::util|431:: [ERROR] return type com.amazonaws.services.s3.AmazonS3 is not compatible with software.amazon.awssdk.services.s3.S3Client 20:40:56 2023-08-30 19:40:56 - ERROR-root::util|431:: [ERROR] -> [Help 1] {code} > [HBoss] add support for AWS v2 SDK > -- > > Key: HBASE-28056 > URL: https://issues.apache.org/jira/browse/HBASE-28056 > Project: HBase > Issue Type: Bug > Components: hboss >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > > HBoss doesn't compile against a version of hadoop built with the AWS v2 SDK, > which HADOOP-18703 will do on hadoop trunk within a few days. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28056) [HBoss] add support for AWS v2 SDK
Steve Loughran created HBASE-28056: -- Summary: [HBoss] add support for AWS v2 SDK Key: HBASE-28056 URL: https://issues.apache.org/jira/browse/HBASE-28056 Project: HBase Issue Type: Bug Components: hboss Reporter: Steve Loughran Assignee: Steve Loughran HBoss doesn't compile against a version of hadoop built with the AWS v2 SDK, which HADOOP-18703 will do on hadoop trunk within a few days. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work started] (HBASE-27900) [HBOSS] Open file fails with NumberFormatException for S3AFileSystem
[ https://issues.apache.org/jira/browse/HBASE-27900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-27900 started by Steve Loughran. -- > [HBOSS] Open file fails with NumberFormatException for S3AFileSystem > > > Key: HBASE-27900 > URL: https://issues.apache.org/jira/browse/HBASE-27900 > Project: HBase > Issue Type: Bug > Components: Filesystem Integration >Affects Versions: 1.0.0-alpha2 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > > In HADOOP-18724 it is shown that the new overloaded setters for double and > float can cause type mismatch and end up setting s3a integer values to floats > to see what breaks. > The wrapper in HBASE-26483 should mimic the hadoop fix and for all > double/float values passed in, cast to log and set as integers only. Nothing > has ever used the float/double values, so this isn't a regression -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27900) [HBOSS] Open file fails with NumberFormatException for S3AFileSystem
Steve Loughran created HBASE-27900: -- Summary: [HBOSS] Open file fails with NumberFormatException for S3AFileSystem Key: HBASE-27900 URL: https://issues.apache.org/jira/browse/HBASE-27900 Project: HBase Issue Type: Bug Components: Filesystem Integration Affects Versions: 1.0.0-alpha2 Reporter: Steve Loughran Assignee: Steve Loughran In HADOOP-18724 it is shown that the new overloaded setters for double and float can cause type mismatch and end up setting s3a integer values to floats to see what breaks. The wrapper in HBASE-26483 should mimic the hadoop fix and for all double/float values passed in, cast to log and set as integers only. Nothing has ever used the float/double values, so this isn't a regression -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27076) [HBOSS] compile against hadoop 3.3.2+ only
Steve Loughran created HBASE-27076: -- Summary: [HBOSS] compile against hadoop 3.3.2+ only Key: HBASE-27076 URL: https://issues.apache.org/jira/browse/HBASE-27076 Project: HBase Issue Type: Improvement Components: hboss Reporter: Steve Loughran to get openFile and other things to work safely, hboss needs to be changed so it only builds against hadoopp 3.3 only. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work started] (HBASE-26483) [HBOSS] add lock around openFile operation
[ https://issues.apache.org/jira/browse/HBASE-26483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-26483 started by Steve Loughran. -- > [HBOSS] add lock around openFile operation > -- > > Key: HBASE-26483 > URL: https://issues.apache.org/jira/browse/HBASE-26483 > Project: HBase > Issue Type: Improvement > Components: hboss >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > > The HBoss FS wrapper doesn't wrap the openFile(path) call with a lock, which > means anything using that builder isn't going to have access synchronized. > adding a wrapper for this method will allow hbase to use the api call and so > request different read policies on different files, or other options -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Assigned] (HBASE-26483) [HBOSS] add lock around openFile operation
[ https://issues.apache.org/jira/browse/HBASE-26483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran reassigned HBASE-26483: -- Assignee: Steve Loughran > [HBOSS] add lock around openFile operation > -- > > Key: HBASE-26483 > URL: https://issues.apache.org/jira/browse/HBASE-26483 > Project: HBase > Issue Type: Improvement > Components: hboss >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > > The HBoss FS wrapper doesn't wrap the openFile(path) call with a lock, which > means anything using that builder isn't going to have access synchronized. > adding a wrapper for this method will allow hbase to use the api call and so > request different read policies on different files, or other options -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-27042) hboss doesn't compile against hadoop branch-3.3 now that s3guard is cut
[ https://issues.apache.org/jira/browse/HBASE-27042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541673#comment-17541673 ] Steve Loughran commented on HBASE-27042: thanks; i will follow up with the other issues next week/month > hboss doesn't compile against hadoop branch-3.3 now that s3guard is cut > --- > > Key: HBASE-27042 > URL: https://issues.apache.org/jira/browse/HBASE-27042 > Project: HBase > Issue Type: Bug > Components: hboss >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Fix For: hbase-filesystem-1.0.0-alpha2 > > > HBoss doesn't compile against hadoop builds containing HADOOP-17409, "remove > s3guard", as test setup tries to turn it off. > there's no need for s3guard any more, so hboss can just avoid all settings > and expect it to be disabled (hadoop 3.3.3. or earlier) or removed (3.4+) > (hboss version is 1.0.0-alpha2-SNAPSHOT) -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-27042) hboss doesn't compile against hadoop branch-3.3 now that s3guard is cut
[ https://issues.apache.org/jira/browse/HBASE-27042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537531#comment-17537531 ] Steve Loughran commented on HBASE-27042: an update of the AWS SDK also breaks the tests, because the AWS client now requires another method to be implemented {code} [ERROR] testListFilesEmptyDirectoryNonrecursive(org.apache.hadoop.hbase.oss.contract.TestHBOSSContractGetFileStatus) Time elapsed: 4.248 s <<< ERROR! java.lang.UnsupportedOperationException: Extend AbstractAmazonS3 to provide an implementation {code} once you tell maven to give you useful stack traces, you can track this down {code} [ERROR] testListLocatedStatusEmptyDirectory(org.apache.hadoop.hbase.oss.contract.TestHBOSSContractGetFileStatus) Time elapsed: 1.365 s <<< ERROR! java.lang.UnsupportedOperationException: Extend AbstractAmazonS3 to provide an implementation at com.amazonaws.services.s3.AbstractAmazonS3.deleteObject(AbstractAmazonS3.java:642) at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$null$13(S3AFileSystem.java:2696) at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDurationOfInvocation(IOStatisticsBinding.java:464) at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$deleteObject$14(S3AFileSystem.java:2694) at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:414) at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:377) at org.apache.hadoop.fs.s3a.S3AFileSystem.deleteObject(S3AFileSystem.java:2690) at org.apache.hadoop.fs.s3a.S3AFileSystem.deleteObjectAtPath(S3AFileSystem.java:2725) at org.apache.hadoop.fs.s3a.S3AFileSystem$OperationCallbacksImpl.lambda$deleteObjectAtPath$0(S3AFileSystem.java:2055) at org.apache.hadoop.fs.s3a.Invoker.lambda$once$0(Invoker.java:135) at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:117) at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:133) at org.apache.hadoop.fs.s3a.S3AFileSystem$OperationCallbacksImpl.deleteObjectAtPath(S3AFileSystem.java:2054) {code} > hboss doesn't compile against hadoop branch-3.3 now that s3guard is cut > --- > > Key: HBASE-27042 > URL: https://issues.apache.org/jira/browse/HBASE-27042 > Project: HBase > Issue Type: Bug > Components: hboss >Reporter: Steve Loughran >Priority: Minor > > HBoss doesn't compile against hadoop builds containing HADOOP-17409, "remove > s3guard", as test setup tries to turn it off. > there's no need for s3guard any more, so hboss can just avoid all settings > and expect it to be disabled (hadoop 3.3.3. or earlier) or removed (3.4+) > (hboss version is 1.0.0-alpha2-SNAPSHOT) -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HBASE-27042) hboss doesn't compile against hadoop branch-3.3 now that s3guard is cut
Steve Loughran created HBASE-27042: -- Summary: hboss doesn't compile against hadoop branch-3.3 now that s3guard is cut Key: HBASE-27042 URL: https://issues.apache.org/jira/browse/HBASE-27042 Project: HBase Issue Type: Bug Components: hboss Reporter: Steve Loughran HBoss doesn't compile against hadoop builds containing HADOOP-17409, "remove s3guard", as test setup tries to turn it off. there's no need for s3guard any more, so hboss can just avoid all settings and expect it to be disabled (hadoop 3.3.3. or earlier) or removed (3.4+) (hboss version is 1.0.0-alpha2-SNAPSHOT) -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HBASE-26483) [HBOSS] add lock around openFile operation
Steve Loughran created HBASE-26483: -- Summary: [HBOSS] add lock around openFile operation Key: HBASE-26483 URL: https://issues.apache.org/jira/browse/HBASE-26483 Project: HBase Issue Type: Improvement Components: hboss Reporter: Steve Loughran The HBoss FS wrapper doesn't wrap the openFile(path) call with a lock, which means anything using that builder isn't going to have access synchronized. adding a wrapper for this method will allow hbase to use the api call and so request different read policies on different files, or other options -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-24989) [HBOSS] Some code cleanup
[ https://issues.apache.org/jira/browse/HBASE-24989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17448023#comment-17448023 ] Steve Loughran commented on HBASE-24989: HADOOP-17409 is going to break EmbeddedS3 as we cut all of the s3guard classes. hboss is going to need to remove the bit where s3guard is set up...not needed anyway > [HBOSS] Some code cleanup > - > > Key: HBASE-24989 > URL: https://issues.apache.org/jira/browse/HBASE-24989 > Project: HBase > Issue Type: Improvement >Affects Versions: hbase-filesystem-1.0.0-alpha1 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Trivial > Fix For: hbase-filesystem-1.0.0-alpha2 > > > This is a cleanup of unused methods/imported classes around several classes > of HBOSS project. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HBASE-25900) HBoss tests compile/failure against Hadoop 3.3.1
Steve Loughran created HBASE-25900: -- Summary: HBoss tests compile/failure against Hadoop 3.3.1 Key: HBASE-25900 URL: https://issues.apache.org/jira/browse/HBASE-25900 Project: HBase Issue Type: Bug Components: Filesystem Integration Affects Versions: 1.0.2 Reporter: Steve Loughran Changes in Hadoop 3.3.x stop the tests compiling/working. * changes in signature of nominally private classes (HADOOP-17497): fix, update * HADOOP-16721 -s3a rename throwing more exceptions, but no longer failing if the dest parent doesn't exist. Fix: change s3a.xml * HADOOP-17531/HADOOP-17620 distcp moving to listIterator; test failures. * HADOOP-13327: tests on syncable which expect files being written to to be visible. Fix: skip that test The fix for HADOOP-17497 stops this compiling against Hadoop < 3.3.1. This is unfortunate but I can't see an easy fix. The new signature takes a parameters class, so we can (and already are) adding new config options without breaking this signature again. And I've tagged it as LimitedPrivate so that future developers will know it's used here -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-22149) HBOSS: A FileSystem implementation to provide HBase's required semantics
[ https://issues.apache.org/jira/browse/HBASE-22149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16817543#comment-16817543 ] Steve Loughran commented on HBASE-22149: bq. LimitedPrivate({"HBase"}), Prefer @VisibleForTesting. I really hate the limited private stuff > HBOSS: A FileSystem implementation to provide HBase's required semantics > > > Key: HBASE-22149 > URL: https://issues.apache.org/jira/browse/HBASE-22149 > Project: HBase > Issue Type: New Feature > Components: Filesystem Integration >Reporter: Sean Mackrory >Assignee: Sean Mackrory >Priority: Critical > Attachments: HBASE-22149-hadoop.patch, HBASE-22149-hbase-2.patch, > HBASE-22149-hbase-3.patch, HBASE-22149-hbase.patch > > > (Have been using the name HBOSS for HBase / Object Store Semantics) > I've had some thoughts about how to solve the problem of running HBase on > object stores. There has been some thought in the past about adding the > required semantics to S3Guard, but I have some concerns about that. First, > it's mixing complicated solutions to different problems (bridging the gap > between a flat namespace and a hierarchical namespace vs. solving > inconsistency). Second, it's S3-specific, whereas other objects stores could > use virtually identical solutions. And third, we can't do things like atomic > renames in a true sense. There would have to be some trade-offs specific to > HBase's needs and it's better if we can solve that in an HBase-specific > module without mixing all that logic in with the rest of S3A. > Ideas to solve this above the FileSystem layer have been proposed and > considered (HBASE-20431, for one), and maybe that's the right way forward > long-term, but it certainly seems to be a hard problem and hasn't been done > yet. But I don't know enough of all the internal considerations to make much > of a judgment on that myself. > I propose a FileSystem implementation that wraps another FileSystem instance > and provides locking of FileSystem operations to ensure correct semantics. > Locking could quite possibly be done on the same ZooKeeper ensemble as an > HBase cluster already uses (I'm sure there are some performance > considerations here that deserve more attention). I've put together a > proof-of-concept on which I've tested some aspects of atomic renames and > atomic file creates. Both of these tests fail reliably on a naked s3a > instance. I've also done a small YCSB run against a small cluster to sanity > check other functionality and was successful. I will post the patch, and my > laundry list of things that still need work. The WAL is still placed on HDFS, > but the HBase root directory is otherwise on S3. > Note that my prototype is built on Hadoop's source tree right now. That's > purely for my convenience in putting it together quickly, as that's where I > mostly work. I actually think long-term, if this is accepted as a good > solution, it makes sense to live in HBase (or it's own repository). It only > depends on stable, public APIs in Hadoop and is targeted entirely at HBase's > needs, so it should be able to iterate on the HBase community's terms alone. > Another idea [~ste...@apache.org] proposed to me is that of an inode-based > FileSystem that keeps hierarchical metadata in a more appropriate store that > would allow the required transactions (maybe a special table in HBase could > provide that store itself for other tables), and stores the underlying files > with unique identifiers on S3. This allows renames to actually become fast > instead of just large atomic operations. It does however place a strong > dependency on the metadata store. I have not explored this idea much. My > current proof-of-concept has been pleasantly simple, so I think it's the > right solution unless it proves unable to provide the required performance > characteristics. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22005) Use ByteBuff's refcnt to track the life cycle of data block
[ https://issues.apache.org/jira/browse/HBASE-22005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16789558#comment-16789558 ] Steve Loughran commented on HBASE-22005: bq. I think 2.7.x is a stable release line, and lots of users are still on it, so it is not likely that we will drop the support for hadoop 2.7.x for our hbase 2.x releases. license incompatibilities in libaries we dist (aws SDK) and ASF mean that we aren't in a position to release new versions; its built on a version of java that's ~impossible to get hold of. We don't really have a choice in the matter > Use ByteBuff's refcnt to track the life cycle of data block > --- > > Key: HBASE-22005 > URL: https://issues.apache.org/jira/browse/HBASE-22005 > Project: HBase > Issue Type: Sub-task >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: HBASE-22005.HBASE-21879.v1.patch, > HBASE-22005.HBASE-21879.v2.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22005) Use ByteBuff's refcnt to track the life cycle of data block
[ https://issues.apache.org/jira/browse/HBASE-22005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16787773#comment-16787773 ] Steve Loughran commented on HBASE-22005: bq. And for high latency object storage, such as S3, I do not see any difference between passing a and ? I did raise that as in option in HADOOP-11867, but Its not what works for Parquet, ORC etc, and its not what we get from the HTTP APIs anyway. So we'd be converting from stream to byte buffer, they'd be going back again. bq. Anyway, we need the StreamCapabilities API to query whether a given stream has the ability, sadly it is only provided in hadoop-2.9+ and we still need to support 2.7... It cant be backported to 2.7.x as no more releases are coming out there -it's EOL. 2.8.x though, that could be done without much difficulty. Have you considered moving to later hadoop libraries? Or more specifically: is there something that's been done there which stops you, or is it just the inertia of the installed base? > Use ByteBuff's refcnt to track the life cycle of data block > --- > > Key: HBASE-22005 > URL: https://issues.apache.org/jira/browse/HBASE-22005 > Project: HBase > Issue Type: Sub-task >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: HBASE-22005.HBASE-21879.v1.patch, > HBASE-22005.HBASE-21879.v2.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22005) Use ByteBuff's refcnt to track the life cycle of data block
[ https://issues.apache.org/jira/browse/HBASE-22005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16787053#comment-16787053 ] Steve Loughran commented on HBASE-22005: I think you'll need a plan to continue to work with stores which don't support BB; that includes object stores which ship with HBase support today (hello Azure!) and whose users will be unhappy when things not working. bq. I still think those basic fs, such as LocalFileSystem/DistributedFileSystem need the ByteBuffer read/pread method, it's so common to use I see the world moving away from Posix in two directions * near-RAM-speed solid state storage. Here memory access operations make a lot more sense than the stream API, because in hardware these can be part of the memory space of the application. Why copy it into process memory at all, when it can just be memory mapped? * object storage. Here we go the other way - high latency IO where the cost of a seek() is such that you can see the logs pause and you'll know "hey! it's a GET". There we're looking at async IO APIs, vectored IO ops etc. I don't expect stores to implement ByteBufferReadable; async vector reads where you provide a list of Use ByteBuff's refcnt to track the life cycle of data block > --- > > Key: HBASE-22005 > URL: https://issues.apache.org/jira/browse/HBASE-22005 > Project: HBase > Issue Type: Sub-task >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: HBASE-22005.HBASE-21879.v1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22005) Use ByteBuff's refcnt to track the life cycle of data block
[ https://issues.apache.org/jira/browse/HBASE-22005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786813#comment-16786813 ] Steve Loughran commented on HBASE-22005: A lot of filesystems don't implement the byte buffer operations, not just through laziness but because if the underlying APIs used for data just work at the stream level (e.g. all the http clients), its pretty suboptimal to try: we'd be streaming into a byte buffer, the app would be pulling out thinking they were getting a performance boost when they weren't etc. See HADOOP-11867 for some discussion of this. I'll accept a patch to let you use the StreamCapabilities to query all the way through a wrapped input stream to see if the feature is available, and once HADOOP-15691 is in, a check on a filesystem before even opening a file. This should let you decide when to switch to ByteBuffers. > Use ByteBuff's refcnt to track the life cycle of data block > --- > > Key: HBASE-22005 > URL: https://issues.apache.org/jira/browse/HBASE-22005 > Project: HBase > Issue Type: Sub-task >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: HBASE-22005.HBASE-21879.v1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20774) FSHDFSUtils#isSameHdfs doesn't handle S3 filesystems correctly.
[ https://issues.apache.org/jira/browse/HBASE-20774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16687155#comment-16687155 ] Steve Loughran commented on HBASE-20774: HADOOP-14556 will add a canonical name when you turn DTs on > FSHDFSUtils#isSameHdfs doesn't handle S3 filesystems correctly. > --- > > Key: HBASE-20774 > URL: https://issues.apache.org/jira/browse/HBASE-20774 > Project: HBase > Issue Type: Bug > Components: Filesystem Integration >Reporter: Austin Heyne >Priority: Major > Labels: S3, S3Native, s3 > > FSHDFSUtils#isSameHdfs retrieves the Canonical Service Name from Hadoop to > determine if source and destination are on the same filesystem. > NativeS3FileSystem, S3FileSystem and presumably S3NativeFileSystem > (com.amazon) always return null in getCanonicalServiceName() which > incorrectly causes isSameHdfs to return false even when they could be the > same. > Error encountered while trying to perform bulk load from S3 to HBase on S3 > backed by the same bucket. This is causing bulk loads from S3 to copy all the > data to the workers and back up to S3. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21149) TestIncrementalBackupWithBulkLoad may fail due to file copy failure
[ https://issues.apache.org/jira/browse/HBASE-21149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16650163#comment-16650163 ] Steve Loughran commented on HBASE-21149: I wouldn't blame distcp here, yet. This hints of a race condition in the distcp setup process: have you kicked off distcp while some of the source files were being written? > TestIncrementalBackupWithBulkLoad may fail due to file copy failure > --- > > Key: HBASE-21149 > URL: https://issues.apache.org/jira/browse/HBASE-21149 > Project: HBase > Issue Type: Test > Components: backup&restore >Reporter: Ted Yu >Assignee: Vladimir Rodionov >Priority: Major > Attachments: 21149.v2.txt, HBASE-21149-v1.patch, > testIncrementalBackupWithBulkLoad-output.txt > > > From > https://builds.apache.org/job/HBase%20Nightly/job/master/471/testReport/junit/org.apache.hadoop.hbase.backup/TestIncrementalBackupWithBulkLoad/TestIncBackupDeleteTable/ > : > {code} > 2018-09-03 11:54:30,526 ERROR [Time-limited test] > impl.TableBackupClient(235): Unexpected Exception : Failed copy from > hdfs://localhost:53075/user/jenkins/test-data/ecd40bd0-cb93-91e0-90b5-7bfd5bb2c566/data/default/test-1535975627781/773f5709b645b46bd3840f9cfb549c5a/f/0f626c66493649daaf84057b8dd71a30_SeqId_205_,hdfs://localhost:53075/user/jenkins/test-data/ecd40bd0-cb93-91e0-90b5-7bfd5bb2c566/data/default/test-1535975627781/773f5709b645b46bd3840f9cfb549c5a/f/ad8df6415bd9459d9b3df76c588d79df_SeqId_205_ > to hdfs://localhost:53075/backupUT/backup_1535975655488 > java.io.IOException: Failed copy from > hdfs://localhost:53075/user/jenkins/test-data/ecd40bd0-cb93-91e0-90b5-7bfd5bb2c566/data/default/test-1535975627781/773f5709b645b46bd3840f9cfb549c5a/f/0f626c66493649daaf84057b8dd71a30_SeqId_205_,hdfs://localhost:53075/user/jenkins/test-data/ecd40bd0-cb93-91e0-90b5-7bfd5bb2c566/data/default/test-1535975627781/773f5709b645b46bd3840f9cfb549c5a/f/ad8df6415bd9459d9b3df76c588d79df_SeqId_205_ > to hdfs://localhost:53075/backupUT/backup_1535975655488 > at > org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.incrementalCopyHFiles(IncrementalTableBackupClient.java:351) > at > org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.copyBulkLoadedFiles(IncrementalTableBackupClient.java:219) > at > org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.handleBulkLoad(IncrementalTableBackupClient.java:198) > at > org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.execute(IncrementalTableBackupClient.java:320) > at > org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:605) > at > org.apache.hadoop.hbase.backup.TestIncrementalBackupWithBulkLoad.TestIncBackupDeleteTable(TestIncrementalBackupWithBulkLoad.java:104) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > {code} > However, some part of the test output was lost: > {code} > 2018-09-03 11:53:36,793 DEBUG [RS:0;765c9ca5ea28:36357] regions > ...[truncated 398396 chars]... > 8) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20429) Support for mixed or write-heavy workloads on non-HDFS filesystems
[ https://issues.apache.org/jira/browse/HBASE-20429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596423#comment-16596423 ] Steve Loughran commented on HBASE-20429: BTW, HADOOP-15691 is my latest iteration of having each FS declare its capabilities. As I've noted at the end, as well as through a new interface, we could expose this as new config options you can look for in fsInstance.getCon().get("option"), provided the FS instances clone their supplied configs and then patch them. This would let you check to see what an FS offered. w.r.t s3guard, need to know what semantics you get. With S3Guard you get consistent listings, but rename is still sub-atomic thanks for promising to invite me to any discussions, as long as it not via Amazon Chime or Skype for Business I'm up for it. > Support for mixed or write-heavy workloads on non-HDFS filesystems > -- > > Key: HBASE-20429 > URL: https://issues.apache.org/jira/browse/HBASE-20429 > Project: HBase > Issue Type: Umbrella >Reporter: Andrew Purtell >Priority: Major > > We can support reasonably well use cases on non-HDFS filesystems, like S3, > where an external writer has loaded (and continues to load) HFiles via the > bulk load mechanism, and then we serve out a read only workload at the HBase > API. > Mixed workloads or write-heavy workloads won't fare as well. In fact, data > loss seems certain. It will depend in the specific filesystem, but all of the > S3 backed Hadoop filesystems suffer from a couple of obvious problems, > notably a lack of atomic rename. > This umbrella will serve to collect some related ideas for consideration. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20429) Support for mixed or write-heavy workloads on non-HDFS filesystems
[ https://issues.apache.org/jira/browse/HBASE-20429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584555#comment-16584555 ] Steve Loughran commented on HBASE-20429: One thing which would be good for you all to write down is: what are your expectations of an FS to work. in particular * create/read/update/delete consistency * listing consistency * which ops are required to be atomic and O(1) * is it ok for create(path, overwrite=false) to be non-atomic? * when you expect things to be written to store * how long do you expect the final close() to take. Identify these things and you can start to see what stores can work. And show you where you need to involve other thigs for the semantics you need. > Support for mixed or write-heavy workloads on non-HDFS filesystems > -- > > Key: HBASE-20429 > URL: https://issues.apache.org/jira/browse/HBASE-20429 > Project: HBase > Issue Type: Umbrella >Reporter: Andrew Purtell >Priority: Major > > We can support reasonably well use cases on non-HDFS filesystems, like S3, > where an external writer has loaded (and continues to load) HFiles via the > bulk load mechanism, and then we serve out a read only workload at the HBase > API. > Mixed workloads or write-heavy workloads won't fare as well. In fact, data > loss seems certain. It will depend in the specific filesystem, but all of the > S3 backed Hadoop filesystems suffer from a couple of obvious problems, > notably a lack of atomic rename. > This umbrella will serve to collect some related ideas for consideration. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-20429) Support for mixed or write-heavy workloads on non-HDFS filesystems
[ https://issues.apache.org/jira/browse/HBASE-20429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584555#comment-16584555 ] Steve Loughran edited comment on HBASE-20429 at 8/18/18 1:02 AM: - One thing which would be good for you all to write down is: what are your expectations of an FS to work. in particular * create/read/update/delete consistency * listing consistency * which ops are required to be atomic and O(1) * is it ok for create(path, overwrite=false) to be non-atomic? * when you expect things to be written to store * how long do you expect the final close() to take. Identify these things and you can start to see what stores can work. And show you where you need to involve other things for the semantics you need. was (Author: ste...@apache.org): One thing which would be good for you all to write down is: what are your expectations of an FS to work. in particular * create/read/update/delete consistency * listing consistency * which ops are required to be atomic and O(1) * is it ok for create(path, overwrite=false) to be non-atomic? * when you expect things to be written to store * how long do you expect the final close() to take. Identify these things and you can start to see what stores can work. And show you where you need to involve other thigs for the semantics you need. > Support for mixed or write-heavy workloads on non-HDFS filesystems > -- > > Key: HBASE-20429 > URL: https://issues.apache.org/jira/browse/HBASE-20429 > Project: HBase > Issue Type: Umbrella >Reporter: Andrew Purtell >Priority: Major > > We can support reasonably well use cases on non-HDFS filesystems, like S3, > where an external writer has loaded (and continues to load) HFiles via the > bulk load mechanism, and then we serve out a read only workload at the HBase > API. > Mixed workloads or write-heavy workloads won't fare as well. In fact, data > loss seems certain. It will depend in the specific filesystem, but all of the > S3 backed Hadoop filesystems suffer from a couple of obvious problems, > notably a lack of atomic rename. > This umbrella will serve to collect some related ideas for consideration. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20774) FSHDFSUtils#isSameHdfs doesn't handle S3 filesystems correctly.
[ https://issues.apache.org/jira/browse/HBASE-20774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16520735#comment-16520735 ] Steve Loughran commented on HBASE-20774: as the S3A javadocs say, "Override getCanonicalServiceName because we don't support token in S3A you will have to fall back to getting the fully qualified path of the root dir fs.makeQualified("/") and then compare by path equality > FSHDFSUtils#isSameHdfs doesn't handle S3 filesystems correctly. > --- > > Key: HBASE-20774 > URL: https://issues.apache.org/jira/browse/HBASE-20774 > Project: HBase > Issue Type: Bug > Components: Filesystem Integration >Reporter: Austin Heyne >Priority: Major > Labels: S3, S3Native, s3 > > FSHDFSUtils#isSameHdfs retrieves the Canonical Service Name from Hadoop to > determine if source and destination are on the same filesystem. > NativeS3FileSystem, S3FileSystem and presumably S3NativeFileSystem > (com.amazon) always return null in getCanonicalServiceName() which > incorrectly causes isSameHdfs to return false even when they could be the > same. > Error encountered while trying to perform bulk load from S3 to HBase on S3 > backed by the same bucket. This is causing bulk loads from S3 to copy all the > data to the workers and back up to S3. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20431) Store commit transaction for filesystems that do not support an atomic rename
[ https://issues.apache.org/jira/browse/HBASE-20431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16456203#comment-16456203 ] Steve Loughran commented on HBASE-20431: FWIW, I've been discussing with Stephan Ewen on the flink team about allowing an option to create files on s3a without doing the precursor checks (is this really a directory, etc), for people which know what they are doing. We'd always do a s3guard check (low cost, stops it becoming corrupted), but it'd avoid the caching of the 404 in the AWS load balancers. They are trying to defend against DoS attacks: nobody else has to. This would fix the GaPaG consistency problem by not doing the G before the P. It would be in a new FileSystem/FileContext create() call which returned a builder that supported custom fs-specific options, as hadoop-3 already does for open(). Something l'd like, but not in my schedule right now, though S3 Select depends on it. > Store commit transaction for filesystems that do not support an atomic rename > - > > Key: HBASE-20431 > URL: https://issues.apache.org/jira/browse/HBASE-20431 > Project: HBase > Issue Type: Sub-task >Reporter: Andrew Purtell >Priority: Major > > HBase expects the Hadoop filesystem implementation to support an atomic > rename() operation. HDFS does. The S3 backed filesystems do not. The > fundamental issue is the non-atomic and eventually consistent nature of the > S3 service. A S3 bucket is not a filesystem. S3 is not always immediately > read-your-writes. Object metadata can be temporarily inconsistent just after > new objects are stored. There can be a settling period to ride over. > Renaming/moving objects from one path to another are copy operations with > O(file) complexity and O(data) time followed by a series of deletes with > O(file) complexity. Failures at any point prior to completion will leave the > operation in an inconsistent state. The missing atomic rename semantic opens > opportunities for corruption and data loss, which may or may not be > repairable with HBCK. > Handling this at the HBase level could be done with a new multi-step > filesystem transaction framework. Call it StoreCommitTransaction. > SplitTransaction and MergeTransaction are well established cases where even > on HDFS we have non-atomic filesystem changes and are our implementation > template for the new work. In this new StoreCommitTransaction we'd be moving > flush and compaction temporaries out of the temporary directory into the > region store directory. On HDFS the implementation would be easy. We can rely > on the filesystem's atomic rename semantics. On S3 it would be work: First we > would build the list of objects to move, then copy each object into the > destination, and then finally delete all objects at the original path. We > must handle transient errors with retry strategies appropriate for the action > at hand. We must handle serious or permanent errors where the RS doesn't need > to be aborted with a rollback that cleans it all up. Finally, we must handle > permanent errors where the RS must be aborted with a rollback during region > open/recovery. Note that after all objects have been copied and we are > deleting obsolete source objects we must roll forward, not back. To support > recovery after an abort we must utilize the WAL to track transaction > progress. Put markers in for StoreCommitTransaction start and completion > state, with details of the store file(s) involved, so it can be rolled back > during region recovery at open. This will be significant work in HFile, > HStore, flusher, compactor, and HRegion. Wherever we use HDFS's rename now we > would substitute the running of this new multi-step filesystem transaction. > We need to determine this for certain, but I believe on S3 the PUT or > multipart upload of an object must complete before the object is visible, so > we don't have to worry about the case where an object is visible before fully > uploaded as part of normal operations. So an individual object copy will > either happen entirely and the target will then become visible, or it won't > and the target won't exist. > S3 has an optimization, PUT COPY > (https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectCOPY.html), which > the AmazonClient embedded in S3A utilizes for moves. When designing the > StoreCommitTransaction be sure to allow for filesystem implementations that > leverage a server side copy operation. Doing a get-then-put should be > optional. (Not sure Hadoop has an interface that advertises this capability > yet; we can add one if not.) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20433) HBase Export Snapshot utility does not close FileSystem instances
[ https://issues.apache.org/jira/browse/HBASE-20433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16456185#comment-16456185 ] Steve Loughran commented on HBASE-20433: Maybe we could do a special build of hadop-aws which can be set to print a stack trace on creation if debug is set. Or actually do this in in FileSystem for broader debugging of all FS leaks. I Can see the value in that from time to time > HBase Export Snapshot utility does not close FileSystem instances > - > > Key: HBASE-20433 > URL: https://issues.apache.org/jira/browse/HBASE-20433 > Project: HBase > Issue Type: Bug > Components: Client, fs, snapshots >Affects Versions: 1.2.6, 1.4.3 >Reporter: Voyta >Priority: Major > > It seems org.apache.hadoop.hbase.snapshot.ExportSnapshot disallows FileSystem > instance caching. > When verifySnapshot method is being run it calls often methods like > org.apache.hadoop.hbase.util.FSUtils#getRootDir that instantiate FileSystem > but never calls org.apache.hadoop.fs.FileSystem#close method. This behaviour > allows allocation of unwanted objects potentially causing memory leaks. > Related issue: https://issues.apache.org/jira/browse/HADOOP-15392 > > Expectation: > * HBase should properly release/close all objects, especially FileSystem > instances. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20431) Store commit transaction for filesystems that do not support an atomic rename
[ https://issues.apache.org/jira/browse/HBASE-20431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452436#comment-16452436 ] Steve Loughran commented on HBASE-20431: [~mbenjamin]: I don't want non-AWS stores to need S3Guard, and there's always the possibility that AWS S3 may itself become consistent. PUT, COPY or MPU should be all that's needed to commit a single file, which is all [~apurtell] thinks he needs > Store commit transaction for filesystems that do not support an atomic rename > - > > Key: HBASE-20431 > URL: https://issues.apache.org/jira/browse/HBASE-20431 > Project: HBase > Issue Type: Sub-task >Reporter: Andrew Purtell >Priority: Major > > HBase expects the Hadoop filesystem implementation to support an atomic > rename() operation. HDFS does. The S3 backed filesystems do not. The > fundamental issue is the non-atomic and eventually consistent nature of the > S3 service. A S3 bucket is not a filesystem. S3 is not always immediately > read-your-writes. Object metadata can be temporarily inconsistent just after > new objects are stored. There can be a settling period to ride over. > Renaming/moving objects from one path to another are copy operations with > O(file) complexity and O(data) time followed by a series of deletes with > O(file) complexity. Failures at any point prior to completion will leave the > operation in an inconsistent state. The missing atomic rename semantic opens > opportunities for corruption and data loss, which may or may not be > repairable with HBCK. > Handling this at the HBase level could be done with a new multi-step > filesystem transaction framework. Call it StoreCommitTransaction. > SplitTransaction and MergeTransaction are well established cases where even > on HDFS we have non-atomic filesystem changes and are our implementation > template for the new work. In this new StoreCommitTransaction we'd be moving > flush and compaction temporaries out of the temporary directory into the > region store directory. On HDFS the implementation would be easy. We can rely > on the filesystem's atomic rename semantics. On S3 it would be work: First we > would build the list of objects to move, then copy each object into the > destination, and then finally delete all objects at the original path. We > must handle transient errors with retry strategies appropriate for the action > at hand. We must handle serious or permanent errors where the RS doesn't need > to be aborted with a rollback that cleans it all up. Finally, we must handle > permanent errors where the RS must be aborted with a rollback during region > open/recovery. Note that after all objects have been copied and we are > deleting obsolete source objects we must roll forward, not back. To support > recovery after an abort we must utilize the WAL to track transaction > progress. Put markers in for StoreCommitTransaction start and completion > state, with details of the store file(s) involved, so it can be rolled back > during region recovery at open. This will be significant work in HFile, > HStore, flusher, compactor, and HRegion. Wherever we use HDFS's rename now we > would substitute the running of this new multi-step filesystem transaction. > We need to determine this for certain, but I believe on S3 the PUT or > multipart upload of an object must complete before the object is visible, so > we don't have to worry about the case where an object is visible before fully > uploaded as part of normal operations. So an individual object copy will > either happen entirely and the target will then become visible, or it won't > and the target won't exist. > S3 has an optimization, PUT COPY > (https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectCOPY.html), which > the AmazonClient embedded in S3A utilizes for moves. When designing the > StoreCommitTransaction be sure to allow for filesystem implementations that > leverage a server side copy operation. Doing a get-then-put should be > optional. (Not sure Hadoop has an interface that advertises this capability > yet; we can add one if not.) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20431) Store commit transaction for filesystems that do not support an atomic rename
[ https://issues.apache.org/jira/browse/HBASE-20431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452099#comment-16452099 ] Steve Loughran commented on HBASE-20431: [~mackrorysd] says: bq. one could modify S3Guard to prevent a destination directory from being visible until it's complete we can, but it'd restrict the code to requiring a DDB, which would make the WDC and Ceph groups sad. I think Andrew could get by without it, if a single file is all that's needed for the commit. > Store commit transaction for filesystems that do not support an atomic rename > - > > Key: HBASE-20431 > URL: https://issues.apache.org/jira/browse/HBASE-20431 > Project: HBase > Issue Type: Sub-task >Reporter: Andrew Purtell >Priority: Major > > HBase expects the Hadoop filesystem implementation to support an atomic > rename() operation. HDFS does. The S3 backed filesystems do not. The > fundamental issue is the non-atomic and eventually consistent nature of the > S3 service. A S3 bucket is not a filesystem. S3 is not always immediately > read-your-writes. Object metadata can be temporarily inconsistent just after > new objects are stored. There can be a settling period to ride over. > Renaming/moving objects from one path to another are copy operations with > O(file) complexity and O(data) time followed by a series of deletes with > O(file) complexity. Failures at any point prior to completion will leave the > operation in an inconsistent state. The missing atomic rename semantic opens > opportunities for corruption and data loss, which may or may not be > repairable with HBCK. > Handling this at the HBase level could be done with a new multi-step > filesystem transaction framework. Call it StoreCommitTransaction. > SplitTransaction and MergeTransaction are well established cases where even > on HDFS we have non-atomic filesystem changes and are our implementation > template for the new work. In this new StoreCommitTransaction we'd be moving > flush and compaction temporaries out of the temporary directory into the > region store directory. On HDFS the implementation would be easy. We can rely > on the filesystem's atomic rename semantics. On S3 it would be work: First we > would build the list of objects to move, then copy each object into the > destination, and then finally delete all objects at the original path. We > must handle transient errors with retry strategies appropriate for the action > at hand. We must handle serious or permanent errors where the RS doesn't need > to be aborted with a rollback that cleans it all up. Finally, we must handle > permanent errors where the RS must be aborted with a rollback during region > open/recovery. Note that after all objects have been copied and we are > deleting obsolete source objects we must roll forward, not back. To support > recovery after an abort we must utilize the WAL to track transaction > progress. Put markers in for StoreCommitTransaction start and completion > state, with details of the store file(s) involved, so it can be rolled back > during region recovery at open. This will be significant work in HFile, > HStore, flusher, compactor, and HRegion. Wherever we use HDFS's rename now we > would substitute the running of this new multi-step filesystem transaction. > We need to determine this for certain, but I believe on S3 the PUT or > multipart upload of an object must complete before the object is visible, so > we don't have to worry about the case where an object is visible before fully > uploaded as part of normal operations. So an individual object copy will > either happen entirely and the target will then become visible, or it won't > and the target won't exist. > S3 has an optimization, PUT COPY > (https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectCOPY.html), which > the AmazonClient embedded in S3A utilizes for moves. When designing the > StoreCommitTransaction be sure to allow for filesystem implementations that > leverage a server side copy operation. Doing a get-then-put should be > optional. (Not sure Hadoop has an interface that advertises this capability > yet; we can add one if not.) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20431) Store commit transaction for filesystems that do not support an atomic rename
[ https://issues.apache.org/jira/browse/HBASE-20431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16450384#comment-16450384 ] Steve Loughran commented on HBASE-20431: S3Guard is only needed when you want consistency on S3A'; amazon have their own (consistent emrfs). and other people (WDC) sell products which are consistent out the box. if ceph is consistent, all is good and you don't need anything else. Trying to work with an inconsistent S3 is dangerous unless you explicitly put long delays in. For example, in a recovery, always wait a minute or more before listing. bq. in testing I noticed some times we'd get back (paraphrased) "200 Internal Error, please retry" Not seen that; assume its handled in the AWS client. We do have retries on some throttles and transient errors, especially that final POST of an MPU, but 200 isn't considered error code. 503 is throttle, I believe (see S3AUtils.translateException()) for our understandings there bq. We also have in our design scope running against Ceph's radosgw so I don't know if we can rely on it totally, but we can take advantage of it if we detect we are running against S3 proper. Raw AWS S3 *absolutely* keeps the output of an MPU invisible until the final POST of the ordered list of checksums of the uploaded parts. You get billed for all that data, so its good to have code to list & purge it (the hadoop s3guard CLI does). Provided the other stores you work with do have the same MPU visibility semantics, all will be well. Who to ask about Ceph? # Maybe [~stevewatt] has a suggestion? It's good to ask the developers to see what they think their system should do... # [~iyonger] has been testing S3A and ceph # And I think maybe now we should make sure there is an explicit test for s3a which verifies that uncommitted MPUs aren't visible. I'm sure that's done implicitly, but having it drawn out into a single method is easier to look at when there are failures. bq. I would not expect you to volunteer code, no worries! (That would be obnoxious... (smile)) Thanks. I'd volunteer Ewan and Thomas but (a) they don't listen to me and (b) they're going to do the API you need with a goal of having it work with other stores too. FYI [~fabbri] > Store commit transaction for filesystems that do not support an atomic rename > - > > Key: HBASE-20431 > URL: https://issues.apache.org/jira/browse/HBASE-20431 > Project: HBase > Issue Type: Sub-task >Reporter: Andrew Purtell >Priority: Major > > HBase expects the Hadoop filesystem implementation to support an atomic > rename() operation. HDFS does. The S3 backed filesystems do not. The > fundamental issue is the non-atomic and eventually consistent nature of the > S3 service. A S3 bucket is not a filesystem. S3 is not always immediately > read-your-writes. Object metadata can be temporarily inconsistent just after > new objects are stored. There can be a settling period to ride over. > Renaming/moving objects from one path to another are copy operations with > O(file) complexity and O(data) time followed by a series of deletes with > O(file) complexity. Failures at any point prior to completion will leave the > operation in an inconsistent state. The missing atomic rename semantic opens > opportunities for corruption and data loss, which may or may not be > repairable with HBCK. > Handling this at the HBase level could be done with a new multi-step > filesystem transaction framework. Call it StoreCommitTransaction. > SplitTransaction and MergeTransaction are well established cases where even > on HDFS we have non-atomic filesystem changes and are our implementation > template for the new work. In this new StoreCommitTransaction we'd be moving > flush and compaction temporaries out of the temporary directory into the > region store directory. On HDFS the implementation would be easy. We can rely > on the filesystem's atomic rename semantics. On S3 it would be work: First we > would build the list of objects to move, then copy each object into the > destination, and then finally delete all objects at the original path. We > must handle transient errors with retry strategies appropriate for the action > at hand. We must handle serious or permanent errors where the RS doesn't need > to be aborted with a rollback that cleans it all up. Finally, we must handle > permanent errors where the RS must be aborted with a rollback during region > open/recovery. Note that after all objects have been copied and we are > deleting obsolete source objects we must roll forward, not back. To support > recovery after an abort we must utilize the WAL to track transaction > progress. Put markers in for StoreCommitTransaction start and completion > state,
[jira] [Commented] (HBASE-20431) Store commit transaction for filesystems that do not support an atomic rename
[ https://issues.apache.org/jira/browse/HBASE-20431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16450053#comment-16450053 ] Steve Loughran commented on HBASE-20431: * you are correct, neither PUT or multipart upload "MPU" has any visible outcome until they are complete. MPUs can be completed in a POST from a different host than that/those uploading blocks, which is how we implement the S3A committers. Talk to [~ehiggs] & [~Thomas Demoor] about theirideas for making that public. If you could use a single MPU to commit the final output, you get a nice O(1) atomic operation. * PUT-COPY is atomic, but it's a 6-10MB/s atomic operation; it's essentially what you get when you rename() a single file. though there we DELETE the source afterwards. We could expose it for S3 & the other stores which offer a similar operation. One thought to consider: although its O(data), its bandwidth is ~0, so you can do most of the copies in parallel. * You aren't worrying about S3 consistency here. For AWS S3 life is easier if you mandate using S3Guard for the consistency layer. Otherwise, you can turn on fault injection in the S3A connector and see what breaks... Looking forward to see what you do here, offering some consultancy on design and test strategies, carefully not volunteering to provide any code... > Store commit transaction for filesystems that do not support an atomic rename > - > > Key: HBASE-20431 > URL: https://issues.apache.org/jira/browse/HBASE-20431 > Project: HBase > Issue Type: Sub-task >Reporter: Andrew Purtell >Priority: Major > > HBase expects the Hadoop filesystem implementation to support an atomic > rename() operation. HDFS does. The S3 backed filesystems do not. The > fundamental issue is the non-atomic and eventually consistent nature of the > S3 service. A S3 bucket is not a filesystem. S3 is not always immediately > read-your-writes. Object metadata can be temporarily inconsistent just after > new objects are stored. There can be a settling period to ride over. > Renaming/moving objects from one path to another are copy operations with > O(file) complexity and O(data) time followed by a series of deletes with > O(file) complexity. Failures at any point prior to completion will leave the > operation in an inconsistent state. The missing atomic rename semantic opens > opportunities for corruption and data loss, which may or may not be > repairable with HBCK. > Handling this at the HBase level could be done with a new multi-step > filesystem transaction framework. Call it StoreCommitTransaction. > SplitTransaction and MergeTransaction are well established cases where even > on HDFS we have non-atomic filesystem changes and are our implementation > template for the new work. In this new StoreCommitTransaction we'd be moving > flush and compaction temporaries out of the temporary directory into the > region store directory. On HDFS the implementation would be easy. We can rely > on the filesystem's atomic rename semantics. On S3 it would be work: First we > would build the list of objects to move, then copy each object into the > destination, and then finally delete all objects at the original path. We > must handle transient errors with retry strategies appropriate for the action > at hand. We must handle serious or permanent errors where the RS doesn't need > to be aborted with a rollback that cleans it all up. Finally, we must handle > permanent errors where the RS must be aborted with a rollback during region > open/recovery. Note that after all objects have been copied and we are > deleting obsolete source objects we must roll forward, not back. To support > recovery after an abort we must utilize the WAL to track transaction > progress. Put markers in for StoreCommitTransaction start and completion > state, with details of the store file(s) involved, so it can be rolled back > during region recovery at open. This will be significant work in HFile, > HStore, flusher, compactor, and HRegion. Wherever we use HDFS's rename now we > would substitute the running of this new multi-step filesystem transaction. > We need to determine this for certain, but I believe on S3 the PUT or > multipart upload of an object must complete before the object is visible, so > we don't have to worry about the case where an object is visible before fully > uploaded as part of normal operations. So an individual object copy will > either happen entirely and the target will then become visible, or it won't > and the target won't exist. > S3 has an optimization, PUT COPY > (https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectCOPY.html), which > the AmazonClient embedded in S3A utilizes for moves. When designing the > StoreCom
[jira] [Commented] (HBASE-20226) Performance Improvement Taking Large Snapshots In Remote Filesystems
[ https://issues.apache.org/jira/browse/HBASE-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16411165#comment-16411165 ] Steve Loughran commented on HBASE-20226: Amazon throttle DELETE to the same shared, so speedup will be sublinear, even though the cost of a delete/bulk delete is low in terms of network traffic. f you are doing bulk deletes in > 1 it's probably best to do a bit of shuffling of the list of directories to delete before queuing the operations. > Performance Improvement Taking Large Snapshots In Remote Filesystems > > > Key: HBASE-20226 > URL: https://issues.apache.org/jira/browse/HBASE-20226 > Project: HBase > Issue Type: Improvement > Components: snapshots >Affects Versions: 1.4.0 > Environment: HBase 1.4.0 running on an AWS EMR cluster with the > hbase.rootdir set to point to a folder in S3 >Reporter: Saad Mufti >Priority: Minor > Attachments: HBASE-20226..01.patch > > > When taking a snapshot of any table, one of the last steps is to delete the > region manifests, which have already been rolled up into a larger overall > manifest and thus have redundant information. > This proposal is to do the deletion in a thread pool bounded by > hbase.snapshot.thread.pool.max . For large tables with a lot of regions, the > current single threaded deletion is taking longer than all the rest of the > snapshot tasks when the Hbase data and the snapshot folder are both in a > remote filesystem like S3. > I have a patch for this proposal almost ready and will submit it tomorrow for > feedback, although I haven't had a chance to write any tests yet. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20123) Backup test fails against hadoop 3
[ https://issues.apache.org/jira/browse/HBASE-20123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16386757#comment-16386757 ] Steve Loughran commented on HBASE-20123: That looks like branch-2 stack trace; HADOOP-13626 changed CopyListingFileStatus to not be a subclass of FileStatus, instead explictly marshalling the permissions. At the same time, that getSymlink() in readFields() call is a branch-3 operation; it's in an assert at the end {code} assert (isDirectory() && getSymlink() == null) || !isDirectory(); {code} I believe that assertion is wrong. It's assuming that getSymlink() returns null if there is no symlink, but instead it raises and exception. And as its an assert(), it's only going to show up in JVMs with assert turned on. I'd suggest that someone (you?) files a JIRA against Hadoop with a patch that changes the exception to something like {code} assert (!(isDirectory() && isSymlink()) {code} that is, you can't be both a dir and a symlink. > Backup test fails against hadoop 3 > -- > > Key: HBASE-20123 > URL: https://issues.apache.org/jira/browse/HBASE-20123 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Priority: Major > > When running backup unit test against hadoop3, I saw: > {code} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 88.862 s <<< FAILURE! - in > org.apache.hadoop.hbase.backup.TestBackupMultipleDeletes > [ERROR] > testBackupMultipleDeletes(org.apache.hadoop.hbase.backup.TestBackupMultipleDeletes) > Time elapsed: 86.206 s <<< ERROR! > java.io.IOException: java.io.IOException: Failed copy from > hdfs://localhost:40578/backupUT/.tmp/backup_1520088356047 to > hdfs://localhost:40578/backupUT > at > org.apache.hadoop.hbase.backup.TestBackupMultipleDeletes.testBackupMultipleDeletes(TestBackupMultipleDeletes.java:82) > Caused by: java.io.IOException: Failed copy from > hdfs://localhost:40578/backupUT/.tmp/backup_1520088356047 to > hdfs://localhost:40578/backupUT > at > org.apache.hadoop.hbase.backup.TestBackupMultipleDeletes.testBackupMultipleDeletes(TestBackupMultipleDeletes.java:82) > {code} > In the test output, I found: > {code} > 2018-03-03 14:46:10,858 ERROR [Time-limited test] > mapreduce.MapReduceBackupCopyJob$BackupDistCp(237): java.io.IOException: Path > hdfs://localhost:40578/backupUT/.tmp/backup_1520088356047 is not a symbolic > link > java.io.IOException: Path > hdfs://localhost:40578/backupUT/.tmp/backup_1520088356047 is not a symbolic > link > at org.apache.hadoop.fs.FileStatus.getSymlink(FileStatus.java:338) > at org.apache.hadoop.fs.FileStatus.readFields(FileStatus.java:461) > at > org.apache.hadoop.tools.CopyListingFileStatus.readFields(CopyListingFileStatus.java:155) > at > org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2308) > at > org.apache.hadoop.tools.CopyListing.validateFinalListing(CopyListing.java:163) > at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:91) > at > org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:90) > at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84) > at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:382) > at > org.apache.hadoop.hbase.backup.mapreduce.MapReduceBackupCopyJob$BackupDistCp.createInputFileListing(MapReduceBackupCopyJob.java:297) > at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:181) > at org.apache.hadoop.tools.DistCp.execute(DistCp.java:153) > at > org.apache.hadoop.hbase.backup.mapreduce.MapReduceBackupCopyJob$BackupDistCp.execute(MapReduceBackupCopyJob.java:196) > at org.apache.hadoop.tools.DistCp.run(DistCp.java:126) > at > org.apache.hadoop.hbase.backup.mapreduce.MapReduceBackupCopyJob.copy(MapReduceBackupCopyJob.java:408) > at > org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.incrementalCopyHFiles(IncrementalTableBackupClient.java:348) > at > org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.execute(IncrementalTableBackupClient.java:290) > at > org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:605) > {code} > It seems the failure was related to how we use distcp. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-7608) Considering Java 8
[ https://issues.apache.org/jira/browse/HBASE-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16361341#comment-16361341 ] Steve Loughran commented on HBASE-7608: --- Is it time to close this as done/worksforme? > Considering Java 8 > -- > > Key: HBASE-7608 > URL: https://issues.apache.org/jira/browse/HBASE-7608 > Project: HBase > Issue Type: Umbrella >Reporter: Andrew Purtell >Priority: Trivial > > Musings (as subtasks) on experimental ideas for when JRE8 is a viable runtime. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19289) CommonFSUtils$StreamLacksCapabilityException: hflush when running test against hadoop3 beta1
[ https://issues.apache.org/jira/browse/HBASE-19289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16283472#comment-16283472 ] Steve Loughran commented on HBASE-19289: What about giving the property some name to make clear its experimental/risky? ""hbase.experimental.stream.capability.enforce.disabled" Then if people set it, well, "told you so" > CommonFSUtils$StreamLacksCapabilityException: hflush when running test > against hadoop3 beta1 > > > Key: HBASE-19289 > URL: https://issues.apache.org/jira/browse/HBASE-19289 > Project: HBase > Issue Type: Test >Reporter: Ted Yu >Assignee: Mike Drob > Attachments: 19289.v1.txt, 19289.v2.txt, HBASE-19289.patch, > HBASE-19289.v2.patch > > > As of commit d8fb10c8329b19223c91d3cda6ef149382ad4ea0 , I encountered the > following exception when running unit test against hadoop3 beta1: > {code} > testRefreshStoreFiles(org.apache.hadoop.hbase.regionserver.TestHStore) Time > elapsed: 0.061 sec <<< ERROR! > java.io.IOException: cannot get log writer > at > org.apache.hadoop.hbase.regionserver.TestHStore.initHRegion(TestHStore.java:215) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:220) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:195) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:190) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:185) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:179) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:173) > at > org.apache.hadoop.hbase.regionserver.TestHStore.testRefreshStoreFiles(TestHStore.java:962) > Caused by: > org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException: > hflush > at > org.apache.hadoop.hbase.regionserver.TestHStore.initHRegion(TestHStore.java:215) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:220) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:195) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:190) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:185) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:179) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:173) > at > org.apache.hadoop.hbase.regionserver.TestHStore.testRefreshStoreFiles(TestHStore.java:962) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19289) CommonFSUtils$StreamLacksCapabilityException: hflush when running test against hadoop3 beta1
[ https://issues.apache.org/jira/browse/HBASE-19289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16280708#comment-16280708 ] Steve Loughran commented on HBASE-19289: looking at the patch * it can take time to start and stop the serve, good to see you making this a class rule. * failures in stop should be caught & logged, in case raising them could iding the underlying exception triggering a test failure. (I don't know enough about custom rules here to know for sure, just based on test teardown method experience) > CommonFSUtils$StreamLacksCapabilityException: hflush when running test > against hadoop3 beta1 > > > Key: HBASE-19289 > URL: https://issues.apache.org/jira/browse/HBASE-19289 > Project: HBase > Issue Type: Test >Reporter: Ted Yu >Assignee: Mike Drob > Attachments: 19289.v1.txt, 19289.v2.txt, HBASE-19289.patch > > > As of commit d8fb10c8329b19223c91d3cda6ef149382ad4ea0 , I encountered the > following exception when running unit test against hadoop3 beta1: > {code} > testRefreshStoreFiles(org.apache.hadoop.hbase.regionserver.TestHStore) Time > elapsed: 0.061 sec <<< ERROR! > java.io.IOException: cannot get log writer > at > org.apache.hadoop.hbase.regionserver.TestHStore.initHRegion(TestHStore.java:215) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:220) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:195) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:190) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:185) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:179) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:173) > at > org.apache.hadoop.hbase.regionserver.TestHStore.testRefreshStoreFiles(TestHStore.java:962) > Caused by: > org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException: > hflush > at > org.apache.hadoop.hbase.regionserver.TestHStore.initHRegion(TestHStore.java:215) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:220) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:195) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:190) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:185) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:179) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:173) > at > org.apache.hadoop.hbase.regionserver.TestHStore.testRefreshStoreFiles(TestHStore.java:962) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19289) CommonFSUtils$StreamLacksCapabilityException: hflush when running test against hadoop3 beta1
[ https://issues.apache.org/jira/browse/HBASE-19289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16261454#comment-16261454 ] Steve Loughran commented on HBASE-19289: ps: Are people using HBase against file:// today? If so, they've not been getting the persistence/durability HBase needs. Tell them to stop it. > CommonFSUtils$StreamLacksCapabilityException: hflush when running test > against hadoop3 beta1 > > > Key: HBASE-19289 > URL: https://issues.apache.org/jira/browse/HBASE-19289 > Project: HBase > Issue Type: Test >Reporter: Ted Yu > Attachments: 19289.v1.txt, 19289.v2.txt > > > As of commit d8fb10c8329b19223c91d3cda6ef149382ad4ea0 , I encountered the > following exception when running unit test against hadoop3 beta1: > {code} > testRefreshStoreFiles(org.apache.hadoop.hbase.regionserver.TestHStore) Time > elapsed: 0.061 sec <<< ERROR! > java.io.IOException: cannot get log writer > at > org.apache.hadoop.hbase.regionserver.TestHStore.initHRegion(TestHStore.java:215) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:220) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:195) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:190) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:185) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:179) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:173) > at > org.apache.hadoop.hbase.regionserver.TestHStore.testRefreshStoreFiles(TestHStore.java:962) > Caused by: > org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException: > hflush > at > org.apache.hadoop.hbase.regionserver.TestHStore.initHRegion(TestHStore.java:215) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:220) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:195) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:190) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:185) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:179) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:173) > at > org.apache.hadoop.hbase.regionserver.TestHStore.testRefreshStoreFiles(TestHStore.java:962) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19289) CommonFSUtils$StreamLacksCapabilityException: hflush when running test against hadoop3 beta1
[ https://issues.apache.org/jira/browse/HBASE-19289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16261449#comment-16261449 ] Steve Loughran commented on HBASE-19289: If people really want hbase -> file:// * they'd need a distributed file:// or some shared NFS server * it'd presumably need its own RAID > 0 to do checksumming; so checksum fs is moot I'd look at seeing whether checksumfs could actually bypass its checksum, say if we set the property "bytes per checksum == 0" as the secret no, turn me off" switch. But people would probably then use it for performance and then be upset when all their data got corrupted without anything noticing. It's too critical a layer under HDFS really. I was thinking about what if we added a raw:// URL which bonded directly to raw local fs, but RawLocalFileSystem has an expectation that file:// is its schema and returns it in getURI(), so forcing you back to CheckummedFS I believe the way to do this is * subclass RawLocalFileSystem * give it a new schema, like say "raw" * have it remember its URI in initialize() and return it in getURI() * register it (statically, dynamically) > CommonFSUtils$StreamLacksCapabilityException: hflush when running test > against hadoop3 beta1 > > > Key: HBASE-19289 > URL: https://issues.apache.org/jira/browse/HBASE-19289 > Project: HBase > Issue Type: Test >Reporter: Ted Yu > Attachments: 19289.v1.txt, 19289.v2.txt > > > As of commit d8fb10c8329b19223c91d3cda6ef149382ad4ea0 , I encountered the > following exception when running unit test against hadoop3 beta1: > {code} > testRefreshStoreFiles(org.apache.hadoop.hbase.regionserver.TestHStore) Time > elapsed: 0.061 sec <<< ERROR! > java.io.IOException: cannot get log writer > at > org.apache.hadoop.hbase.regionserver.TestHStore.initHRegion(TestHStore.java:215) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:220) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:195) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:190) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:185) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:179) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:173) > at > org.apache.hadoop.hbase.regionserver.TestHStore.testRefreshStoreFiles(TestHStore.java:962) > Caused by: > org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException: > hflush > at > org.apache.hadoop.hbase.regionserver.TestHStore.initHRegion(TestHStore.java:215) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:220) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:195) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:190) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:185) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:179) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:173) > at > org.apache.hadoop.hbase.regionserver.TestHStore.testRefreshStoreFiles(TestHStore.java:962) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19289) CommonFSUtils$StreamLacksCapabilityException: hflush when running test against hadoop3 beta1
[ https://issues.apache.org/jira/browse/HBASE-19289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16258446#comment-16258446 ] Steve Loughran commented on HBASE-19289: Closed HADOOP-15051 as wontfix. LocalFS output streams don't declare their support for hflush/sync for the following reason, as covered in HADOOP-13327 (oustanding, reviews welcome) h3. Output streams which do not implement the flush/persitence semantics of hflush/hsync MUST NOT declare that their streams have that capability. LocalFileSystem is a subclass of ChecksumFileSystem; ChecksumFileSystem output streams don't implement hflush/hsync, therefore it's the correct behaviour in the Hadoop code. If HBase requires the methods for the correct persistence of its data, then it cannot safely use localFS as destination of its output. It's check is therefore also the correct behavior In which case, "expressly tell folks not to run HBase on top of LocalFileSystem," is the correct action on your part. People must not be using the local FS as a direct destination of HDFS output. > CommonFSUtils$StreamLacksCapabilityException: hflush when running test > against hadoop3 beta1 > > > Key: HBASE-19289 > URL: https://issues.apache.org/jira/browse/HBASE-19289 > Project: HBase > Issue Type: Test >Reporter: Ted Yu > Attachments: 19289.v1.txt > > > As of commit d8fb10c8329b19223c91d3cda6ef149382ad4ea0 , I encountered the > following exception when running unit test against hadoop3 beta1: > {code} > testRefreshStoreFiles(org.apache.hadoop.hbase.regionserver.TestHStore) Time > elapsed: 0.061 sec <<< ERROR! > java.io.IOException: cannot get log writer > at > org.apache.hadoop.hbase.regionserver.TestHStore.initHRegion(TestHStore.java:215) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:220) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:195) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:190) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:185) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:179) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:173) > at > org.apache.hadoop.hbase.regionserver.TestHStore.testRefreshStoreFiles(TestHStore.java:962) > Caused by: > org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException: > hflush > at > org.apache.hadoop.hbase.regionserver.TestHStore.initHRegion(TestHStore.java:215) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:220) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:195) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:190) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:185) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:179) > at > org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:173) > at > org.apache.hadoop.hbase.regionserver.TestHStore.testRefreshStoreFiles(TestHStore.java:962) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18784) Use of filesystem that requires hflush / hsync / append / etc should query outputstream capabilities
[ https://issues.apache.org/jira/browse/HBASE-18784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16173353#comment-16173353 ] Steve Loughran commented on HBASE-18784: {{StreamCapabilities}} is on Hadoop 2.9, so you can start planning it earlier. Also: I've been playing with using it for input stream capabilities too (CanUnbuffer, seek()), etc. > Use of filesystem that requires hflush / hsync / append / etc should query > outputstream capabilities > > > Key: HBASE-18784 > URL: https://issues.apache.org/jira/browse/HBASE-18784 > Project: HBase > Issue Type: Improvement > Components: Filesystem Integration >Affects Versions: 1.4.0, 2.0.0-alpha-2 >Reporter: Sean Busbey >Assignee: Sean Busbey >Priority: Blocker > Fix For: 2.1.0, 1.5.0 > > > In places where we rely on the underlying filesystem holding up the promises > of hflush/hsync (most importantly the WAL), we should use the new interfaces > provided by HDFS-11644 to fail loudly when they are not present (e.g. on S3, > on EC mounts, etc). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057556#comment-16057556 ] Steve Loughran commented on HBASE-17125: bq. Then why don't you implement it by yourself? If you think it is easy, then please implement it. Generally, as a committer it's actually more productive to nurture other developers into working towards what you believe to be the right answer than do it yourself. As well as sharing some of your unrealistic set of deliverables with others, you can be the reviewer to gets the stuff in, instead of having a patch you have chase other people to review. Long term: the more people you can you can get to collaborate helps the project all round. No opinions on the patch, just making sure everyone works together on this. Thanks. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17878) java.lang.NoSuchMethodError: org.joda.time.format.DateTimeFormatter.withZoneUTC()Lorg/joda/time/format/DateTimeFormatter when starting HBase with hbase.rootdir on S3
[ https://issues.apache.org/jira/browse/HBASE-17878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15993396#comment-15993396 ] Steve Loughran commented on HBASE-17878: AWS SDK uses Joda time to generate authentication strings; due to changes in JVMs, it needs a version >= 2.8.1., but should be fairly forgiving as to which version after that. Hadoop trunk has switched to a fully-shaded version of the AWS SDK, more to deal with jackson versions rather than Joda Time, but again, it may work here. That is still stabilising: adding 50MB of .class has its own unexpected consequences > java.lang.NoSuchMethodError: > org.joda.time.format.DateTimeFormatter.withZoneUTC()Lorg/joda/time/format/DateTimeFormatter > when starting HBase with hbase.rootdir on S3 > - > > Key: HBASE-17878 > URL: https://issues.apache.org/jira/browse/HBASE-17878 > Project: HBase > Issue Type: Bug > Components: master >Reporter: Xiang Li >Assignee: Xiang Li >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-17878.master.000.patch, jruby-core-dep-tree.txt > > > When setting up HBASE-17437 (Support specifying a WAL directory outside of > the root directory), we specify > (1) hbase.rootdir on s3a > (2) hbase.wal.dir on HDFS > When starting HBase, the following exception is thrown: > {code} > Caused by: java.lang.NoSuchMethodError: > org.joda.time.format.DateTimeFormatter.withZoneUTC()Lorg/joda/time/format/DateTimeFormatter; > at > com.amazonaws.auth.internal.AWS4SignerUtils.(AWS4SignerUtils.java:26) > at > com.amazonaws.auth.internal.AWS4SignerRequestParams.(AWS4SignerRequestParams.java:85) > at com.amazonaws.auth.AWS4Signer.sign(AWS4Signer.java:184) > at > com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:709) > at > com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:489) > at > com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:310) > at > com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3785) > at > com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1107) > at > com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:1070) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:232) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) > at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) > at org.apache.hadoop.hbase.util.FSUtils.getRootDir(FSUtils.java:1007) > at > org.apache.hadoop.hbase.util.FSUtils.isValidWALRootDir(FSUtils.java:1050) > at > org.apache.hadoop.hbase.util.FSUtils.getWALRootDir(FSUtils.java:1032) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.initializeFileSystem(HRegionServer.java:627) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:570) > at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:393) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2456) > ... 5 more > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17877) Replace/improve HBase's byte[] comparator
[ https://issues.apache.org/jira/browse/HBASE-17877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15973578#comment-15973578 ] Steve Loughran commented on HBASE-17877: bq Also while looking at some UnsignedBytes function for the guava version we were using I noticed that guava v14.0.1 also uses this implementation so probably hadoop borrowed it from there .. I was about to say "no it doesn't", but then I saw the comment at the top, "This is borrowed and slightly modified from Guava's". Been in there a long time though (since 2012), so it's history is lost. x86 pert is what really matters, though we don't want to be pathologically antisocial to the other CPU arches, not just for the sake of PPC, but for when Arm goes mainstream in the DC. > Replace/improve HBase's byte[] comparator > - > > Key: HBASE-17877 > URL: https://issues.apache.org/jira/browse/HBASE-17877 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Vikas Vishwakarma > Attachments: 17877-1.2.patch, 17877-v2-1.3.patch, 17877-v3-1.3.patch, > 17877-v4-1.3.patch, ByteComparatorJiraHBASE-17877.pdf, > HBASE-17877.branch-1.3.001.patch, HBASE-17877.branch-1.3.002.patch, > HBASE-17877.master.001.patch, HBASE-17877.master.002.patch > > > [~vik.karma] did some extensive tests and found that Hadoop's version is > faster - dramatically faster in some cases. > Patch forthcoming. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-11045) Replace deprecated method FileSystem#createNonRecursive
[ https://issues.apache.org/jira/browse/HBASE-11045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299781#comment-15299781 ] Steve Loughran commented on HBASE-11045: BTW, given that HDFS has un-deprecated {{createNonRecursive()}}, what about closing this as a WONTFIX? > Replace deprecated method FileSystem#createNonRecursive > --- > > Key: HBASE-11045 > URL: https://issues.apache.org/jira/browse/HBASE-11045 > Project: HBase > Issue Type: Task >Reporter: Gustavo Anatoly >Assignee: Gustavo Anatoly >Priority: Minor > Fix For: 2.0.0 > > > This change affect directly ProtobufLogWriter#init() associated to > TestHLog#testFailedToCreateHLogIfParentRenamed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11045) Replace deprecated method FileSystem#createNonRecursive
[ https://issues.apache.org/jira/browse/HBASE-11045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299778#comment-15299778 ] Steve Loughran commented on HBASE-11045: you *cannot* use swift instead of HDFS. It isn't a real filesystem and things will fail dramatically —even if this method was implemented, there are too many other differences. The fact that your attempt is failing this early on, while frustrating, stops you getting deeper into trouble. Sorry. Note that you can't use S3 either, same problem. see: [Object stores vs filesystems](http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/introduction.html). > Replace deprecated method FileSystem#createNonRecursive > --- > > Key: HBASE-11045 > URL: https://issues.apache.org/jira/browse/HBASE-11045 > Project: HBase > Issue Type: Task >Reporter: Gustavo Anatoly >Assignee: Gustavo Anatoly >Priority: Minor > Fix For: 2.0.0 > > > This change affect directly ProtobufLogWriter#init() associated to > TestHLog#testFailedToCreateHLogIfParentRenamed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648186#comment-14648186 ] Steve Loughran commented on HBASE-13992: Coverage is an odd metric anyway, because as well as code there's state coverage : ipv6, windows, timezone=GMT0, locale=turkish, which can break things even in code which nominally had 100%. Having tests which generate failure conditions (done here) with test setups that explore the configuration space are about the best you can get. > Integrate SparkOnHBase into HBase > - > > Key: HBASE-13992 > URL: https://issues.apache.org/jira/browse/HBASE-13992 > Project: HBase > Issue Type: New Feature > Components: spark >Reporter: Ted Malaska >Assignee: Ted Malaska > Fix For: 2.0.0 > > Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, > HBASE-13992.12.patch, HBASE-13992.5.patch, HBASE-13992.6.patch, > HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, > HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, > HBASE-13992.patch.5 > > > This Jira is to ask if SparkOnHBase can find a home in side HBase core. > Here is the github: > https://github.com/cloudera-labs/SparkOnHBase > I am the core author of this project and the license is Apache 2.0 > A blog explaining this project is here > http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ > A spark Streaming example is here > http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ > A real customer using this in produce is blogged here > http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ > Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14644580#comment-14644580 ] Steve Loughran commented on HBASE-13992: LGTM : I only worry about testability, and that's a good start. More tests will no doubt come over time ... something in Bigtop would be good for the integration > Integrate SparkOnHBase into HBase > - > > Key: HBASE-13992 > URL: https://issues.apache.org/jira/browse/HBASE-13992 > Project: HBase > Issue Type: New Feature > Components: spark >Reporter: Ted Malaska >Assignee: Ted Malaska > Fix For: 2.0.0 > > Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, > HBASE-13992.12.patch, HBASE-13992.5.patch, HBASE-13992.6.patch, > HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, > HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, > HBASE-13992.patch.5 > > > This Jira is to ask if SparkOnHBase can find a home in side HBase core. > Here is the github: > https://github.com/cloudera-labs/SparkOnHBase > I am the core author of this project and the license is Apache 2.0 > A blog explaining this project is here > http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ > A spark Streaming example is here > http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ > A real customer using this in produce is blogged here > http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ > Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14643322#comment-14643322 ] Steve Loughran commented on HBASE-13992: New tests look good. # its probably best to put artifact versions in the root pom.xml, not the spark one, for one single place for dependencies ... this will matter if >1 scala module goes in. # {{HBaseDStreamFunctionsSuite.scala }} has the wrong assumption. {code} {{assert(foo5.equals("bar"), foo4 + "!=bar")}} {code} Scalatest lets you use assertResult instead, for an auto-generated message {code} assertResult("bar") { foo5 } {code} And you can use {{==}} for a slightly less informative error message, but one which still includes the values on either side > Integrate SparkOnHBase into HBase > - > > Key: HBASE-13992 > URL: https://issues.apache.org/jira/browse/HBASE-13992 > Project: HBase > Issue Type: New Feature > Components: spark >Reporter: Ted Malaska >Assignee: Ted Malaska > Fix For: 2.0.0 > > Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, > HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, > HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, > HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 > > > This Jira is to ask if SparkOnHBase can find a home in side HBase core. > Here is the github: > https://github.com/cloudera-labs/SparkOnHBase > I am the core author of this project and the license is Apache 2.0 > A blog explaining this project is here > http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ > A spark Streaming example is here > http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ > A real customer using this in produce is blogged here > http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ > Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14639697#comment-14639697 ] Steve Loughran commented on HBASE-13992: Were a project I was a committer on, I'd be mandating the failure tests, as they are the tests most likely to break things. As I'm not an HBase committer, I will leave the opinions to others. At the very least, there needs to be a followup JIRA for the extra tests. As ted notes, they should just throw the standard exceptions. > Integrate SparkOnHBase into HBase > - > > Key: HBASE-13992 > URL: https://issues.apache.org/jira/browse/HBASE-13992 > Project: HBase > Issue Type: New Feature > Components: spark >Reporter: Ted Malaska >Assignee: Ted Malaska > Fix For: 2.0.0 > > Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, > HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, > HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, > HBASE-13992.patch.5 > > > This Jira is to ask if SparkOnHBase can find a home in side HBase core. > Here is the github: > https://github.com/cloudera-labs/SparkOnHBase > I am the core author of this project and the license is Apache 2.0 > A blog explaining this project is here > http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ > A spark Streaming example is here > http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ > A real customer using this in produce is blogged here > http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ > Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14639357#comment-14639357 ] Steve Loughran commented on HBASE-13992: There's not much in the way of tests here, in particular, not much in the way of generation of failure conditions and validation of outcome Ideally, there'd be one test to generate each failure condition: the exception handling including those which downgrade a failure to a log message...the test should verify that such actions are the correct response. At the very least, I'd recommend # test against non-existent database # attempt to work with a table that doesn't exist # attempt to read a column that doesn't exist I'd also make sure test teardown is robust, catching exceptions & downgrading to logs. That way, if something didn't get set up properly, the root cause of the failure isn't hidden by any exception generated in teardown. > Integrate SparkOnHBase into HBase > - > > Key: HBASE-13992 > URL: https://issues.apache.org/jira/browse/HBASE-13992 > Project: HBase > Issue Type: New Feature > Components: spark >Reporter: Ted Malaska >Assignee: Ted Malaska > Fix For: 2.0.0 > > Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, > HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, > HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, > HBASE-13992.patch.5 > > > This Jira is to ask if SparkOnHBase can find a home in side HBase core. > Here is the github: > https://github.com/cloudera-labs/SparkOnHBase > I am the core author of this project and the license is Apache 2.0 > A blog explaining this project is here > http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ > A spark Streaming example is here > http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ > A real customer using this in produce is blogged here > http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ > Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12006) [JDK 8] KeyStoreTestUtil#generateCertificate fails due to "subject class type invalid"
[ https://issues.apache.org/jira/browse/HBASE-12006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146231#comment-14146231 ] Steve Loughran commented on HBASE-12006: {{sun.security.x509}} will go away in Java 9. This means the test utils may need some more work. > [JDK 8] KeyStoreTestUtil#generateCertificate fails due to "subject class type > invalid" > -- > > Key: HBASE-12006 > URL: https://issues.apache.org/jira/browse/HBASE-12006 > Project: HBase > Issue Type: Bug >Affects Versions: 0.99.0, 2.0.0 >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > > Running tests on Java 8. All unit tests for branch 0.98 pass. On master > branch some variation in the security API is causing a failure in > TestSSLHttpServer: > {noformat} > Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0.181 sec <<< > FAILURE! - in org.apache.hadoop.hbase.http.TestSSLHttpServer > org.apache.hadoop.hbase.http.TestSSLHttpServer Time elapsed: 0.181 sec <<< > ERROR! > java.security.cert.CertificateException: Subject class type invalid. > at sun.security.x509.X509CertInfo.setSubject(X509CertInfo.java:888) > at sun.security.x509.X509CertInfo.set(X509CertInfo.java:415) > at > org.apache.hadoop.hbase.http.ssl.KeyStoreTestUtil.generateCertificate(KeyStoreTestUtil.java:94) > at > org.apache.hadoop.hbase.http.ssl.KeyStoreTestUtil.setupSSLConfig(KeyStoreTestUtil.java:246) > at > org.apache.hadoop.hbase.http.TestSSLHttpServer.setup(TestSSLHttpServer.java:72) > org.apache.hadoop.hbase.http.TestSSLHttpServer Time elapsed: 0.181 sec <<< > ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.hbase.http.TestSSLHttpServer.cleanup(TestSSLHttpServer.java:100) > Tests in error: > TestSSLHttpServer.setup:72 » Certificate Subject class type invalid. > TestSSLHttpServer.cleanup:100 NullPointer > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11045) Replace deprecated method FileSystem#createNonRecursive
[ https://issues.apache.org/jira/browse/HBASE-11045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13977194#comment-13977194 ] Steve Loughran commented on HBASE-11045: Personally, I wouldn't have had {{create()}} create any parent directories at all, leave it to the responsibility of the caller, but for reasons of history, that's not the case... > Replace deprecated method FileSystem#createNonRecursive > --- > > Key: HBASE-11045 > URL: https://issues.apache.org/jira/browse/HBASE-11045 > Project: HBase > Issue Type: Task >Reporter: Gustavo Anatoly >Assignee: Gustavo Anatoly >Priority: Minor > Fix For: 0.99.0 > > > This change affect directly ProtobufLogWriter#init() associated to > TestHLog#testFailedToCreateHLogIfParentRenamed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11045) Replace deprecated method FileSystem#createNonRecursive
[ https://issues.apache.org/jira/browse/HBASE-11045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976573#comment-13976573 ] Steve Loughran commented on HBASE-11045: {{FileSystem#createNonRecursive()}} isn't implemented by many filesystems, using it would run the risk of hitting implementations that don't. Is there any barrier to using a check for the parent dir existing before calling create? That's essentially what most filesystems would end up doing {code} createNoRecursive(Filesystem fs, Path p) { if (!fs.exists(p.parent()) throw FileNotFoundException(p.parent()) fs.create(p) } {code} It's not atomic, but if you look closely at the source, it's not atomic in most FS implementations anyway, including native (mkdirs() isn't atomic there). > Replace deprecated method FileSystem#createNonRecursive > --- > > Key: HBASE-11045 > URL: https://issues.apache.org/jira/browse/HBASE-11045 > Project: HBase > Issue Type: Task >Reporter: Gustavo Anatoly >Assignee: Gustavo Anatoly >Priority: Minor > Fix For: 0.99.0 > > > This change affect directly ProtobufLogWriter#init() associated to > TestHLog#testFailedToCreateHLogIfParentRenamed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10444) NPE seen in logs at tail of fatal shutdown
[ https://issues.apache.org/jira/browse/HBASE-10444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887668#comment-13887668 ] Steve Loughran commented on HBASE-10444: I may have unintentionally deployed 0.96.0 instead; propose closing as cannot reproduce until I can see it again? > NPE seen in logs at tail of fatal shutdown > -- > > Key: HBASE-10444 > URL: https://issues.apache.org/jira/browse/HBASE-10444 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 > Environment: in 0.98.0 RC1 >Reporter: Steve Loughran >Priority: Minor > > hbase RS logs show an NPE in shutdown; no other info > {code} > 14/01/30 14:18:25 INFO ipc.RpcServer: Stopping server on 57186 > Exception in thread "regionserver57186" java.lang.NullPointerException > at > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:897) > at java.lang.Thread.run(Thread.java:744) > 14/01/30 14:18:25 ERROR regionserver.HRegionServerCommand > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10444) NPE seen in logs at tail of fatal shutdown
[ https://issues.apache.org/jira/browse/HBASE-10444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887589#comment-13887589 ] Steve Loughran commented on HBASE-10444: commit #50f5a7a, by the look of things, unless I've accidentally been using an older version > NPE seen in logs at tail of fatal shutdown > -- > > Key: HBASE-10444 > URL: https://issues.apache.org/jira/browse/HBASE-10444 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 > Environment: in 0.98.0 RC1 >Reporter: Steve Loughran >Priority: Minor > > hbase RS logs show an NPE in shutdown; no other info > {code} > 14/01/30 14:18:25 INFO ipc.RpcServer: Stopping server on 57186 > Exception in thread "regionserver57186" java.lang.NullPointerException > at > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:897) > at java.lang.Thread.run(Thread.java:744) > 14/01/30 14:18:25 ERROR regionserver.HRegionServerCommand > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10444) NPE seen in logs at tail of fatal shutdown
[ https://issues.apache.org/jira/browse/HBASE-10444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886619#comment-13886619 ] Steve Loughran commented on HBASE-10444: more logs {code} 14/01/30 14:18:22 FATAL regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [] 14/01/30 14:18:22 INFO regionserver.HRegionServer: STOPPED: Unexpected exception during initialization, aborting 14/01/30 14:18:23 INFO zookeeper.ClientCnxn: Opening socket connection to server ubuntu/192.168.1.132:2181. Will not attempt to authenticate using SASL (unknown error) 14/01/30 14:18:23 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) 14/01/30 14:18:24 INFO zookeeper.ClientCnxn: Opening socket connection to server ubuntu/192.168.1.132:2181. Will not attempt to authenticate using SASL (unknown error) 14/01/30 14:18:24 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) 14/01/30 14:18:25 INFO ipc.RpcServer: Stopping server on 57186 14/01/30 14:18:25 FATAL regionserver.HRegionServer: ABORTING region server ubuntu,57186,1391091486549: Initialization of RS failed. Hence aborting RS. java.io.IOException: Received the shutdown message while waiting. at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:757) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:706) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:678) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:806) at java.lang.Thread.run(Thread.java:744) 14/01/30 14:18:25 FATAL regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [] 14/01/30 14:18:25 INFO regionserver.HRegionServer: STOPPED: Initialization of RS failed. Hence aborting RS. 14/01/30 14:18:25 INFO ipc.RpcServer: Stopping server on 57186 Exception in thread "regionserver57186" java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:897) at java.lang.Thread.run(Thread.java:744) 14/01/30 14:18:25 ERROR regionserver.HRegionServerCommandLine: Region server exiting java.lang.RuntimeException: HRegionServer Aborted at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:66) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:85) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2336) 14/01/30 14:18:25 INFO regionserver.ShutdownHook: Shutdown hook starting; hbase.shutdown.hook=true; fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@1407408 14/01/30 14:18:25 INFO regionserver.HRegionServer: STOPPED: Shutdown hook 14/01/30 14:18:25 INFO regionserver.ShutdownHook: Starting fs shutdown hook thread. 14/01/30 14:18:25 INFO regionserver.ShutdownHook: Shutdown hook finished. {code} > NPE seen in logs at tail of fatal shutdown > -- > > Key: HBASE-10444 > URL: https://issues.apache.org/jira/browse/HBASE-10444 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 > Environment: in 0.98.0 RC1 >Reporter: Steve Loughran >Priority: Minor > > hbase RS logs show an NPE in shutdown; no other info > {code} > 14/01/30 14:18:25 INFO ipc.RpcServer: Stopping server on 57186 > Exception in thread "regionserver57186" java.lang.NullPointerException > at > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:897) > at java.lang.Thread.run(Thread.java:744) > 14/01/30 14:18:25 ERROR regionserver.HRegionServerCommand > {code} -- This message was sent by Atlassian JIRA (v6.1.
[jira] [Created] (HBASE-10444) NPE seen in logs at tail of fatal shutdown
Steve Loughran created HBASE-10444: -- Summary: NPE seen in logs at tail of fatal shutdown Key: HBASE-10444 URL: https://issues.apache.org/jira/browse/HBASE-10444 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Environment: in 0.98.0 RC1 Reporter: Steve Loughran Priority: Minor hbase RS logs show an NPE in shutdown; no other info {code} 14/01/30 14:18:25 INFO ipc.RpcServer: Stopping server on 57186 Exception in thread "regionserver57186" java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:897) at java.lang.Thread.run(Thread.java:744) 14/01/30 14:18:25 ERROR regionserver.HRegionServerCommand {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10296) Replace ZK with a paxos running within master processes to provide better master failover performance and state consistency
[ https://issues.apache.org/jira/browse/HBASE-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13866846#comment-13866846 ] Steve Loughran commented on HBASE-10296: ..but that ZK path is used to find the hbase master even if it moves round a cluster -what would happen there? > Replace ZK with a paxos running within master processes to provide better > master failover performance and state consistency > --- > > Key: HBASE-10296 > URL: https://issues.apache.org/jira/browse/HBASE-10296 > Project: HBase > Issue Type: Brainstorming > Components: master, Region Assignment, regionserver >Reporter: Feng Honghua > > Currently master relies on ZK to elect active master, monitor liveness and > store almost all of its states, such as region states, table info, > replication info and so on. And zk also plays as a channel for > master-regionserver communication(such as in region assigning) and > client-regionserver communication(such as replication state/behavior change). > But zk as a communication channel is fragile due to its one-time watch and > asynchronous notification mechanism which together can leads to missed > events(hence missed messages), for example the master must rely on the state > transition logic's idempotence to maintain the region assigning state > machine's correctness, actually almost all of the most tricky inconsistency > issues can trace back their root cause to the fragility of zk as a > communication channel. > Replace zk with paxos running within master processes have following benefits: > 1. better master failover performance: all master, either the active or the > standby ones, have the same latest states in memory(except lag ones but which > can eventually catch up later on). whenever the active master dies, the newly > elected active master can immediately play its role without such failover > work as building its in-memory states by consulting meta-table and zk. > 2. better state consistency: master's in-memory states are the only truth > about the system,which can eliminate inconsistency from the very beginning. > and though the states are contained by all masters, paxos guarantees they are > identical at any time. > 3. more direct and simple communication pattern: client changes state by > sending requests to master, master and regionserver talk directly to each > other by sending request and response...all don't bother to using a > third-party storage like zk which can introduce more uncertainty, worse > latency and more complexity. > 4. zk can only be used as liveness monitoring for determining if a > regionserver is dead, and later on we can eliminate zk totally when we build > heartbeat between master and regionserver. > I know this might looks like a very crazy re-architect, but it deserves deep > thinking and serious discussion for it, right? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10296) Replace ZK with a paxos running within master processes to provide better master failover performance and state consistency
[ https://issues.apache.org/jira/browse/HBASE-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13866467#comment-13866467 ] Steve Loughran commented on HBASE-10296: The google chubby paper goes into some detail about why they implemented a Paxos Service and not a paxos library. yet perhaps you could persuade the ZK team to rework the code enough that you could reuse it independently of ZK. Implementing a consensus protocol is surprisingly hard as you have to # understand Paxos # implement it # prove that your implementation is correct Unit tests are not enough -talk to the ZK team about what they had to do to show that it works > Replace ZK with a paxos running within master processes to provide better > master failover performance and state consistency > --- > > Key: HBASE-10296 > URL: https://issues.apache.org/jira/browse/HBASE-10296 > Project: HBase > Issue Type: Brainstorming > Components: master, Region Assignment, regionserver >Reporter: Feng Honghua > > Currently master relies on ZK to elect active master, monitor liveness and > store almost all of its states, such as region states, table info, > replication info and so on. And zk also plays as a channel for > master-regionserver communication(such as in region assigning) and > client-regionserver communication(such as replication state/behavior change). > But zk as a communication channel is fragile due to its one-time watch and > asynchronous notification mechanism which together can leads to missed > events(hence missed messages), for example the master must rely on the state > transition logic's idempotence to maintain the region assigning state > machine's correctness, actually almost all of the most tricky inconsistency > issues can trace back their root cause to the fragility of zk as a > communication channel. > Replace zk with paxos running within master processes have following benefits: > 1. better master failover performance: all master, either the active or the > standby ones, have the same latest states in memory(except lag ones but which > can eventually catch up later on). whenever the active master dies, the newly > elected active master can immediately play its role without such failover > work as building its in-memory states by consulting meta-table and zk. > 2. better state consistency: master's in-memory states are the only truth > about the system,which can eliminate inconsistency from the very beginning. > and though the states are contained by all masters, paxos guarantees they are > identical at any time. > 3. more direct and simple communication pattern: client changes state by > sending requests to master, master and regionserver talk directly to each > other by sending request and response...all don't bother to using a > third-party storage like zk which can introduce more uncertainty, worse > latency and more complexity. > 4. zk can only be used as liveness monitoring for determining if a > regionserver is dead, and later on we can eliminate zk totally when we build > heartbeat between master and regionserver. > I know this might looks like a very crazy re-architect, but it deserves deep > thinking and serious discussion for it, right? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10296) Replace ZK with a paxos running within master processes to provide better master failover performance and state consistency
[ https://issues.apache.org/jira/browse/HBASE-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13865230#comment-13865230 ] Steve Loughran commented on HBASE-10296: One aspect of ZK that is worth remembering is that it lets other apps keep an eye on what is going on > Replace ZK with a paxos running within master processes to provide better > master failover performance and state consistency > --- > > Key: HBASE-10296 > URL: https://issues.apache.org/jira/browse/HBASE-10296 > Project: HBase > Issue Type: Brainstorming > Components: master, Region Assignment, regionserver >Reporter: Feng Honghua > > Currently master relies on ZK to elect active master, monitor liveness and > store almost all of its states, such as region states, table info, > replication info and so on. And zk also plays as a channel for > master-regionserver communication(such as in region assigning) and > client-regionserver communication(such as replication state/behavior change). > But zk as a communication channel is fragile due to its one-time watch and > asynchronous notification mechanism which together can leads to missed > events(hence missed messages), for example the master must rely on the state > transition logic's idempotence to maintain the region assigning state > machine's correctness, actually almost all of the most tricky inconsistency > issues can trace back their root cause to the fragility of zk as a > communication channel. > Replace zk with paxos running within master processes have following benefits: > 1. better master failover performance: all master, either the active or the > standby ones, have the same latest states in memory(except lag ones but which > can eventually catch up later on). whenever the active master dies, the newly > elected active master can immediately play its role without such failover > work as building its in-memory states by consulting meta-table and zk. > 2. better state consistency: master's in-memory states are the only truth > about the system,which can eliminate inconsistency from the very beginning. > and though the states are contained by all masters, paxos guarantees they are > identical at any time. > 3. more direct and simple communication pattern: client changes state by > sending requests to master, master and regionserver talk directly to each > other by sending request and response...all don't bother to using a > third-party storage like zk which can introduce more uncertainty, worse > latency and more complexity. > 4. zk can only be used as liveness monitoring for determining if a > regionserver is dead, and later on we can eliminate zk totally when we build > heartbeat between master and regionserver. > I know this might looks like a very crazy re-architect, but it deserves deep > thinking and serious discussion for it, right? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9892) Add info port to ServerName to support multi instances in a node
[ https://issues.apache.org/jira/browse/HBASE-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13833639#comment-13833639 ] Steve Loughran commented on HBASE-9892: --- I had a quick look and while the intricacies of HBase code escape me , it looks like the Masters get the port info via ZK. Does this propagate as far as the hbase status data you get with {{HBaseAdmin.getClusterStatus()}}? That's where I need to pick it up from -along with the infoserver port of the master itself -steve > Add info port to ServerName to support multi instances in a node > > > Key: HBASE-9892 > URL: https://issues.apache.org/jira/browse/HBASE-9892 > Project: HBase > Issue Type: Improvement >Reporter: Liu Shaohui >Assignee: Liu Shaohui >Priority: Minor > Attachments: HBASE-9892-0.94-v1.diff, HBASE-9892-0.94-v2.diff, > HBASE-9892-0.94-v3.diff, HBASE-9892-0.94-v4.diff > > > The full GC time of regionserver with big heap(> 30G ) usually can not be > controlled in 30s. At the same time, the servers with 64G memory are normal. > So we try to deploy multi rs instances(2-3 ) in a single node and the heap of > each rs is about 20G ~ 24G. > Most of the things works fine, except the hbase web ui. The master get the RS > info port from conf, which is suitable for this situation of multi rs > instances in a node. So we add info port to ServerName. > a. at the startup, rs report it's info port to Hmaster. > b, For root region, rs write the servername with info port ro the zookeeper > root-region-server node. > c, For meta regions, rs write the servername with info port to root region > d. For user regions, rs write the servername with info port to meta regions > So hmaster and client can get info port from the servername. > To test this feature, I change the rs num from 1 to 3 in standalone mode, so > we can test it in standalone mode, > I think Hoya(hbase on yarn) will encounter the same problem. Anyone knows > how Hoya handle this problem? > PS: There are different formats for servername in zk node and meta table, i > think we need to unify it and refactor the code. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9892) Add info port to ServerName to support multi instances in a node
[ https://issues.apache.org/jira/browse/HBASE-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815174#comment-13815174 ] Steve Loughran commented on HBASE-9892: --- As Enis says, currently we know the problem is there but don't try to fix it. The issue we have there is not just if YARN assigns >1 region server to the same node (it doesn't currently support anti-affinity in allocation requests), but that someone else may be running their own application, HBase or otherwise, on the same machine. If you hard code a port it can fail -any port. The sole advantage we have is that will trigger a new container request/review Because this also affects the masters, we have to leave that UI at port 0 too -which is the worst issue. I would really like to get hold of that via ZK, from where we can bootstrap the rest of the cluster information > Add info port to ServerName to support multi instances in a node > > > Key: HBASE-9892 > URL: https://issues.apache.org/jira/browse/HBASE-9892 > Project: HBase > Issue Type: Improvement >Reporter: Liu Shaohui >Assignee: Liu Shaohui >Priority: Minor > Attachments: HBASE-9892-0.94-v1.diff, HBASE-9892-0.94-v2.diff, > HBASE-9892-0.94-v3.diff > > > The full GC time of regionserver with big heap(> 30G ) usually can not be > controlled in 30s. At the same time, the servers with 64G memory are normal. > So we try to deploy multi rs instances(2-3 ) in a single node and the heap of > each rs is about 20G ~ 24G. > Most of the things works fine, except the hbase web ui. The master get the RS > info port from conf, which is suitable for this situation of multi rs > instances in a node. So we add info port to ServerName. > a. at the startup, rs report it's info port to Hmaster. > b, For root region, rs write the servername with info port ro the zookeeper > root-region-server node. > c, For meta regions, rs write the servername with info port to root region > d. For user regions, rs write the servername with info port to meta regions > So hmaster and client can get info port from the servername. > To test this feature, I change the rs num from 1 to 3 in standalone mode, so > we can test it in standalone mode, > I think Hoya(hbase on yarn) will encounter the same problem. Anyone knows > how Hoya handle this problem? > PS: There are different formats for servername in zk node and meta table, i > think we need to unify it and refactor the code. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9802) A new failover test framework for HBase
[ https://issues.apache.org/jira/browse/HBASE-9802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799098#comment-13799098 ] Steve Loughran commented on HBASE-9802: --- This sounds interesting and potentially very useful beyond just HBase. Hadoop YARN applications are the obvious target, as they need to be written to expect failure, and if they don't get tested, well, they won't work. I ended up doing some basics of this with ssh and reboot operations, but I really wanted something that could talk to an open WRT base station and actually generate real network partitions, rather than just simulations. # Accumulo has something similar, though I've not seen it # would it be possible to make this more generic? Even if starts off in HBase, it could be good to have the option of branching off into its own project -and to allow people downstream to use it even earlier. I'd propose making the core test framework a module that could be picked up and used downstream, precisely to get that cross-application testing > A new failover test framework for HBase > --- > > Key: HBASE-9802 > URL: https://issues.apache.org/jira/browse/HBASE-9802 > Project: HBase > Issue Type: Improvement > Components: test >Affects Versions: 0.94.3 >Reporter: chendihao >Priority: Minor > > Currently HBase uses ChaosMonkey for IT test and fault injection. It will > restart regionserver, force balancer and perform other actions randomly and > periodically. However, we need a more extensible and full-featured framework > for our failover test and we find ChaosMonkey cant' suit our needs since it > has the following drawbacks. > 1) Only process-level actions can be simulated, not support > machine-level/hardware-level/network-level actions. > 2) No data validation before and after the test, the fatal bugs such as that > can cause data inconsistent may be overlook. > 3) When failure occurs, we can't repro the problem and hard to figure out the > reason. > Therefore, we have developed a new framework to satisfy the need of failover > test. We extended ChaosMonkey and implement the function to validate data and > to replay failed actions. Here are the features we add. > 1) Policy/Task/Action abstraction, seperating Task from Policy and Action > makes it easier to manage and replay a set of actions. > 2) Make action configurable. We have implemented some actions to cause > machine failure and defined the same interface as original actions. > 3) We should validate the date consistent before and after failover test to > ensure the availability and data correctness. > 4) After performing a set of actions, we also check the consistency of table > as well. > 5) The set of actions that caused test failure can be replayed, and the > reproducibility of actions can help fixing the exposed bugs. > Our team has developed this framework and run for a while. Some bugs were > exposed and fixed by running this test framework. Moreover, we have a monitor > program which shows the progress of failover test and make sure our cluster > is as stable as we want. Now we are trying to make it more general and will > opensource it later. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (HBASE-9545) NPE when trying to get cluster status on an hbase cluster that isn't there
[ https://issues.apache.org/jira/browse/HBASE-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HBASE-9545. --- Resolution: Duplicate > NPE when trying to get cluster status on an hbase cluster that isn't there > -- > > Key: HBASE-9545 > URL: https://issues.apache.org/jira/browse/HBASE-9545 > Project: HBase > Issue Type: Bug > Components: Client > Environment: 0-95.3 snapshot, commit 943bffc >Reporter: Steve Loughran >Priority: Minor > > As part of some fault injection testing, I'm trying to talk to an > HBaseCluster that isn't there, opening a connection and expecting things to > fail. It turns out you can create an {{HBaseAdmin}} instance, but when you > ask for its cluster status the NPE surfaces -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9545) NPE when trying to get cluster status on an hbase cluster that isn't there
[ https://issues.apache.org/jira/browse/HBASE-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13769346#comment-13769346 ] Steve Loughran commented on HBASE-9545: --- you are right -it goes away on trunk. Marking as duplicate > NPE when trying to get cluster status on an hbase cluster that isn't there > -- > > Key: HBASE-9545 > URL: https://issues.apache.org/jira/browse/HBASE-9545 > Project: HBase > Issue Type: Bug > Components: Client > Environment: 0-95.3 snapshot, commit 943bffc >Reporter: Steve Loughran >Priority: Minor > > As part of some fault injection testing, I'm trying to talk to an > HBaseCluster that isn't there, opening a connection and expecting things to > fail. It turns out you can create an {{HBaseAdmin}} instance, but when you > ask for its cluster status the NPE surfaces -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9545) NPE when trying to get cluster status on an hbase cluster that isn't there
[ https://issues.apache.org/jira/browse/HBASE-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13768522#comment-13768522 ] Steve Loughran commented on HBASE-9545: --- Stack trace {code} java.lang.NullPointerException at org.apache.hadoop.hbase.client.HBaseAdmin$MasterMonitorCallable.close(HBaseAdmin.java:3053) at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3089) at org.apache.hadoop.hbase.client.HBaseAdmin.getClusterStatus(HBaseAdmin.java:2081) at org.apache.hadoop.hoya.yarn.cluster.failures.TestKilledAM.testKilledAM(TestKilledAM.groovy:84) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {code} It looks like {{MasterMonitorCallable.close()}} assumes its {{masterMonitor}} field is never null, but if there is no connection,that isn't true. The close() operation should be made a bit more robust, so as not to hide the underlying RPC failures I expect to see > NPE when trying to get cluster status on an hbase cluster that isn't there > -- > > Key: HBASE-9545 > URL: https://issues.apache.org/jira/browse/HBASE-9545 > Project: HBase > Issue Type: Bug > Components: Client > Environment: 0-95.3 snapshot, commit 943bffc >Reporter: Steve Loughran >Priority: Minor > > As part of some fault injection testing, I'm trying to talk to an > HBaseCluster that isn't there, opening a connection and expecting things to > fail. It turns out you can create an {{HBaseAdmin}} instance, but when you > ask for its cluster status the NPE surfaces -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-9545) NPE when trying to get cluster status on an hbase cluster that isn't there
Steve Loughran created HBASE-9545: - Summary: NPE when trying to get cluster status on an hbase cluster that isn't there Key: HBASE-9545 URL: https://issues.apache.org/jira/browse/HBASE-9545 Project: HBase Issue Type: Bug Components: Client Environment: 0-95.3 snapshot, commit 943bffc Reporter: Steve Loughran Priority: Minor As part of some fault injection testing, I'm trying to talk to an HBaseCluster that isn't there, opening a connection and expecting things to fail. It turns out you can create an {{HBaseAdmin}} instance, but when you ask for its cluster status the NPE surfaces -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-9294) NPE in /rs-status during RS shutdown
Steve Loughran created HBASE-9294: - Summary: NPE in /rs-status during RS shutdown Key: HBASE-9294 URL: https://issues.apache.org/jira/browse/HBASE-9294 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.95.2 Reporter: Steve Loughran Priority: Minor While hitting reload to see when a kill-initiated RS shutdown would make the Web UI go away, I got a stack trace from an NPE -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9294) NPE in /rs-status during RS shutdown
[ https://issues.apache.org/jira/browse/HBASE-9294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13747055#comment-13747055 ] Steve Loughran commented on HBASE-9294: --- {code} java.lang.NullPointerException at org.apache.hadoop.hbase.tmpl.regionserver.RSStatusTmplImpl.renderNoFlush(RSStatusTmplImpl.java:163) at org.apache.hadoop.hbase.tmpl.regionserver.RSStatusTmpl.renderNoFlush(RSStatusTmpl.java:172) at org.apache.hadoop.hbase.tmpl.regionserver.RSStatusTmpl.render(RSStatusTmpl.java:163) at org.apache.hadoop.hbase.regionserver.RSStatusServlet.doGet(RSStatusServlet.java:49) at javax.servlet.http.HttpServlet.service(HttpServlet.java:734) at javax.servlet.http.HttpServlet.service(HttpServlet.java:847) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1077) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) {code} > NPE in /rs-status during RS shutdown > > > Key: HBASE-9294 > URL: https://issues.apache.org/jira/browse/HBASE-9294 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.95.2 >Reporter: Steve Loughran >Priority: Minor > > While hitting reload to see when a kill-initiated RS shutdown would make the > Web UI go away, I got a stack trace from an NPE -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9185) mvn site target fails when building with Maven 3.1
[ https://issues.apache.org/jira/browse/HBASE-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13736618#comment-13736618 ] Steve Loughran commented on HBASE-9185: --- [~stack] -thanks for the fix. .bq (What you doing messing w/ mvn!). breaking your build. Next question? > mvn site target fails when building with Maven 3.1 > -- > > Key: HBASE-9185 > URL: https://issues.apache.org/jira/browse/HBASE-9185 > Project: HBase > Issue Type: Bug > Components: build >Affects Versions: 0.95.2 > Environment: Apache Maven 3.1.0 > (893ca28a1da9d5f51ac03827af98bb730128f9f2; 2013-06-27 19:15:32-0700) > Java version: 1.6.0_51, vendor: Apple Inc. > Java home: > /Library/Java/JavaVirtualMachines/1.6.0_51-b11-457.jdk/Contents/Home > Default locale: en_US, platform encoding: MacRoman > OS name: "mac os x", version: "10.8.4", arch: "x86_64", family: "mac" >Reporter: Steve Loughran >Assignee: stack >Priority: Minor > Fix For: 0.98.0, 0.95.2 > > Attachments: 9185.txt > > > mvn site fails when building with mvn 3.1 due to various class changes inside > maven. They promise that switching to new versions of some mvn modules will > result in builds that work in both 3.0.x and 3.1: > [https://cwiki.apache.org/confluence/display/MAVEN/AetherClassNotFound] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9185) mvn site target fails when building with Maven 3.1
[ https://issues.apache.org/jira/browse/HBASE-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735601#comment-13735601 ] Steve Loughran commented on HBASE-9185: --- full log {code} INFO] --- maven-site-plugin:3.2:site (default-site) @ hbase --- [WARNING] Error injecting: org.apache.maven.reporting.exec.DefaultMavenReportExecutor java.lang.NoClassDefFoundError: org/sonatype/aether/graph/DependencyFilter at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2437) at java.lang.Class.getDeclaredConstructors(Class.java:1863) at com.google.inject.spi.InjectionPoint.forConstructorOf(InjectionPoint.java:245) at com.google.inject.internal.ConstructorBindingImpl.create(ConstructorBindingImpl.java:99) at com.google.inject.internal.InjectorImpl.createUninitializedBinding(InjectorImpl.java:653) at com.google.inject.internal.InjectorImpl.createJustInTimeBinding(InjectorImpl.java:863) at com.google.inject.internal.InjectorImpl.createJustInTimeBindingRecursive(InjectorImpl.java:790) at com.google.inject.internal.InjectorImpl.getJustInTimeBinding(InjectorImpl.java:278) at com.google.inject.internal.InjectorImpl.getBindingOrThrow(InjectorImpl.java:210) at com.google.inject.internal.InjectorImpl.getProviderOrThrow(InjectorImpl.java:986) at com.google.inject.internal.InjectorImpl.getProvider(InjectorImpl.java:1019) at com.google.inject.internal.InjectorImpl.getProvider(InjectorImpl.java:982) at com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1032) at org.eclipse.sisu.reflect.AbstractDeferredClass.get(AbstractDeferredClass.java:44) at com.google.inject.internal.ProviderInternalFactory.provision(ProviderInternalFactory.java:86) at com.google.inject.internal.InternalFactoryToInitializableAdapter.provision(InternalFactoryToInitializableAdapter.java:55) at com.google.inject.internal.ProviderInternalFactory$1.call(ProviderInternalFactory.java:70) at com.google.inject.internal.ProvisionListenerStackCallback$Provision.provision(ProvisionListenerStackCallback.java:100) at org.eclipse.sisu.plexus.lifecycles.PlexusLifecycleManager.onProvision(PlexusLifecycleManager.java:134) at com.google.inject.internal.ProvisionListenerStackCallback$Provision.provision(ProvisionListenerStackCallback.java:109) at com.google.inject.internal.ProvisionListenerStackCallback.provision(ProvisionListenerStackCallback.java:55) at com.google.inject.internal.ProviderInternalFactory.circularGet(ProviderInternalFactory.java:68) at com.google.inject.internal.InternalFactoryToInitializableAdapter.get(InternalFactoryToInitializableAdapter.java:47) at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46) at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1054) at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40) at com.google.inject.Scopes$1$1.get(Scopes.java:59) at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41) at com.google.inject.internal.InjectorImpl$2$1.call(InjectorImpl.java:997) at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1047) at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:993) at org.eclipse.sisu.locators.LazyBeanEntry.getValue(LazyBeanEntry.java:82) at org.eclipse.sisu.plexus.locators.LazyPlexusBean.getValue(LazyPlexusBean.java:52) at org.codehaus.plexus.DefaultPlexusContainer.lookup(DefaultPlexusContainer.java:259) at org.codehaus.plexus.DefaultPlexusContainer.lookup(DefaultPlexusContainer.java:239) at org.codehaus.plexus.DefaultPlexusContainer.lookup(DefaultPlexusContainer.java:233) at org.apache.maven.plugins.site.AbstractSiteRenderingMojo.getReports(AbstractSiteRenderingMojo.java:229) at org.apache.maven.plugins.site.SiteMojo.execute(SiteMojo.java:121) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:106) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59) at org.apache.maven.lifecycle.in
[jira] [Commented] (HBASE-9185) mvn site target fails when building with Maven 3.1
[ https://issues.apache.org/jira/browse/HBASE-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735606#comment-13735606 ] Steve Loughran commented on HBASE-9185: --- BTW, the command that failed was {code} mvn clean install -DskipTests javadoc:aggregate site assembly:single {code} > mvn site target fails when building with Maven 3.1 > -- > > Key: HBASE-9185 > URL: https://issues.apache.org/jira/browse/HBASE-9185 > Project: HBase > Issue Type: Bug > Components: build >Affects Versions: 0.95.2 > Environment: Apache Maven 3.1.0 > (893ca28a1da9d5f51ac03827af98bb730128f9f2; 2013-06-27 19:15:32-0700) > Java version: 1.6.0_51, vendor: Apple Inc. > Java home: > /Library/Java/JavaVirtualMachines/1.6.0_51-b11-457.jdk/Contents/Home > Default locale: en_US, platform encoding: MacRoman > OS name: "mac os x", version: "10.8.4", arch: "x86_64", family: "mac" >Reporter: Steve Loughran >Priority: Minor > > mvn site fails when building with mvn 3.1 due to various class changes inside > maven. They promise that switching to new versions of some mvn modules will > result in builds that work in both 3.0.x and 3.1: > [https://cwiki.apache.org/confluence/display/MAVEN/AetherClassNotFound] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-9185) mvn site target fails when building with Maven 3.1
Steve Loughran created HBASE-9185: - Summary: mvn site target fails when building with Maven 3.1 Key: HBASE-9185 URL: https://issues.apache.org/jira/browse/HBASE-9185 Project: HBase Issue Type: Bug Components: build Affects Versions: 0.95.2 Environment: Apache Maven 3.1.0 (893ca28a1da9d5f51ac03827af98bb730128f9f2; 2013-06-27 19:15:32-0700) Java version: 1.6.0_51, vendor: Apple Inc. Java home: /Library/Java/JavaVirtualMachines/1.6.0_51-b11-457.jdk/Contents/Home Default locale: en_US, platform encoding: MacRoman OS name: "mac os x", version: "10.8.4", arch: "x86_64", family: "mac" Reporter: Steve Loughran Priority: Minor mvn site fails when building with mvn 3.1 due to various class changes inside maven. They promise that switching to new versions of some mvn modules will result in builds that work in both 3.0.x and 3.1: [https://cwiki.apache.org/confluence/display/MAVEN/AetherClassNotFound] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira