[jira] [Created] (HBASE-13233) add hbase-11339 branch to the patch testing script
Jonathan Hsieh created HBASE-13233: -- Summary: add hbase-11339 branch to the patch testing script Key: HBASE-13233 URL: https://issues.apache.org/jira/browse/HBASE-13233 Project: HBase Issue Type: Task Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh adding hbase-11339 to the BRANCH_NAMES so we can use the apache bot to test patches on that branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13234) Improve the obviousness of the download link on hbase.apache.org
Andrew Purtell created HBASE-13234: -- Summary: Improve the obviousness of the download link on hbase.apache.org Key: HBASE-13234 URL: https://issues.apache.org/jira/browse/HBASE-13234 Project: HBase Issue Type: Task Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Minor Update the hbase.apache.org homepage to include a very obvious section describing how a user can Download HBase Software Here with a link. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSS] Dependency compatibility
On Fri, Mar 13, 2015 at 11:59 AM, Andrew Purtell apurt...@apache.org wrote: There's no reason our HDFS usage should be exposed in the HBase client code, and I think the application classpath feature for YARN in that version can isolate us on the MR side. I was thinking more the case where we have to bump our version of Guava because our version and Hadoop's version are mutually incompatible and causing compilation failures or runtime failures or both. This was a thing once. Would it be possible to have different dependencies specified for client and server Maven projects? I suppose we could hack this, though it would be ugly. Yes, this is certainly doable in Maven. I think such a change would need to come with documentation about and changes in assumptions in our architecture, namely that the client side code only relate to the server side code via RPC. I'm not sure if this sounds more like a 2.0 thing or a 1.1 / 1.2 thing. -- Sean
Re: Rough goal timelines for 1.1 and 2.0
Do we need to couple decisions for 1.1 and 2.0 in the same discussion? On Fri, Mar 13, 2015 at 10:16 AM, Sean Busbey bus...@cloudera.com wrote: I think the last time this came up the answer to when? was * 1.1 in time for phoenix 5 * 2.0 later than 6 months and sooner than 6 years from 1.0 Can we discuss some goal post dates for these versions? I'd like to help Allen get HBASE-13231 (shell script update) done, and he's wondering how much he needs to prioritize it based on our expectations for when things come out. What do folks think of initial RC for 1.1 at the end of June and initial RC for 2.0 around November? Hopefully the 1.1 RC process will take ~1 month and the 2.0 ~2 months. That would have 2.0 come out just around a year after 1.0. -- Sean -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: hbase security issue
You’re going to need to set up a trust in Kerberos. Either single way trust or two way trust. YMMV. Its not as simple as just setting up an SSL although you can set up an SSL to encrypt the traffic between clusters, however the clusters themselves are not secured. HTH On Mar 12, 2015, at 4:39 PM, Vladimir Rodionov vladrodio...@gmail.com wrote: Thanks, Jerry. I think webdfs is preferable as since it is natively supported by hdfs (name node and data nodes) and traffic does not pass single gateway? Found this link how to set up webdfs over ssl: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.7/bk_Security_Guide/content/ch_wire-webhdfs-mr-yarn.html Cool. If works :). -Vlad On Thu, Mar 12, 2015 at 2:24 PM, Jerry He jerry...@gmail.com wrote: Hi, Vladimir Hope I understand your question correctly. If both local cluster and remote cluster are Kerberos enabled, ExportSnapshot from local to remote will work as long as both clusters' Kerberos have been set up in a way that they understand each other. If the remote cluster's httpfs/webhdfs port is protected by https security, after you set up the certificate on the client side, you will be able to talk to the remote port with SSL protection. Jerry On Thu, Mar 12, 2015 at 1:48 PM, Vladimir Rodionov vladrodio...@gmail.com wrote: You can also specify the remote target with a httpfs or webfdfs url, which then you can leverage SSL on the transport. What if remote cluster has security enabled? Will it work? -Vlad On Thu, Mar 12, 2015 at 1:39 PM, Jerry He jerry...@gmail.com wrote: ExportSnapshot does not use DistCp but directly use FileSystem API to copy, as Vladimir mentioned. But ExportSnapshot supports exporting to a remote target cluster. Give the full hdfs url. You can also specify the remote target with a httpfs or webfdfs url, which then you can leverage SSL on the transport. You also can copy to local cluster and use DistCp to copy to remote cluster. Jerry On Thu, Mar 12, 2015 at 12:28 PM, Vladimir Rodionov vladrodio...@gmail.com wrote: No, ExportSnapshot does not use DistCp it runs its own M/R job to copy data over to a new destination. In a map task it uses HDFS API to create/write data to a new destination. Therefore, the easiest way to secure communication during this operation is to use secure HDFS transport. http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-3-1/CDH4-Security-Guide/cdh4sg_topic_14_2.html but there is caveat ... ExportSnapshot does not support external cluster configuration - you can't provide path to external cluster config dir. This seems like a good feature request. -Vlad On Thu, Mar 12, 2015 at 10:38 AM, Akmal Abbasov akmal.abba...@icloud.com wrote: Hi, I am new to Hadoop Hbase. I have a Hbase cluster in one datacenter, and I need to create a backup in the second one. Currently the second HBase cluster is ready, and I would like to import data from first cluster. I would like to use exportSnapshot tool for this, I’ve tried it one my test environment, and it worked well. But, since know I am going to export to a different cluster in different datacenter, I would like to be sure that my data is secure. So how I can make exportSnapshot secure? As far as I understood exportSnapshot uses distcp tool to copy snapshot to destination cluster, so in this case is it enough to configure distcp? Thank you! The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com
Re: Rough goal timelines for 1.1 and 2.0
That was my question.. We can discuss them independently? Or is there a reason not to? On Fri, Mar 13, 2015 at 11:10 AM, Sean Busbey bus...@cloudera.com wrote: On Fri, Mar 13, 2015 at 12:31 PM, Andrew Purtell apurt...@apache.org wrote: Do we need to couple decisions for 1.1 and 2.0 in the same discussion? Like what? Interface changes for Phoenix maybe? -- Sean -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: Rough goal timelines for 1.1 and 2.0
The only reason I can think of to make decisions now would be if we want to ensure we have consensus for the changes for Phoenix and enough time to implement them. Given that AFAIK it's those changes that'll drive having a 1.1 release, seems prudent. But I haven't been tracking the changes lately. I think we're all in agreement that something needs to be done, and that HBase 1.1 and Phoenix 5 are the places to do it. Probably it won't be contentious to just decide as changes are ready? -- Sean On Mar 13, 2015 1:28 PM, Andrew Purtell apurt...@apache.org wrote: That was my question.. We can discuss them independently? Or is there a reason not to? On Fri, Mar 13, 2015 at 11:10 AM, Sean Busbey bus...@cloudera.com wrote: On Fri, Mar 13, 2015 at 12:31 PM, Andrew Purtell apurt...@apache.org wrote: Do we need to couple decisions for 1.1 and 2.0 in the same discussion? Like what? Interface changes for Phoenix maybe? -- Sean -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: [DISCUSS] Dependency compatibility
I think we can solve this generally for Hadoop 2.6.0+. There's no reason our HDFS usage should be exposed in the HBase client code, and I think the application classpath feature for YARN in that version can isolate us on the MR side. I am willing to do this work in time for 1.1. Realistically I don't know the timeline for that version yet. If it turns out the work is more involved or my time is more constrained then I think, I'm willing to accept promise weakening as a practical matter. HBase-1.x series SHOULD work with Hadoop versions as old as 2.2. That is what we promise for 1.x series. So solving the problem in Hadoop-2.6+ will not solve it for 1.1. It is great if we can help Hadoop with classloading issues, do shading there, and do shading in HBase and reduce the dependencies etc. However, since we cannot do these in HBase-1.x series (we cannot shade deps in the same manner), I do not see a way how to get around this by doing anything other than what I propose. We have discussed many of the compat dimensions before we adopted them in PMC. For some of those (like binary compat), we decided that we cannot support them between minor versions in 1.x , so we decided 'false on that dimension. For these we explicitly decided that when we can realistically have more guarantees, we can start supporting this dimension (client binary compat in minor versions) and have 2.x or later support those. I see the dependency compat dimension in the same vein. It is clear (at least to me) that we cannot support pragmatically any dep compat in 1.x series. If you/we can make all the necessary changes in Hadoop and HBase, we can reintroduce this back. Until then though, I would rather not block any progress and drop the support. As you said the timeline is not clear, so why are we cornering ourselves (especially for 1.x series) for this? I'd be much more comfortable weakening our dependency promises for coprocessor than doing it in general. Folks running coprocessors should already be more risk tolerant and familiar with our internals. For upstreams that don't have the leverage on us of Hadoop, we solve this problem simply by not updating dependencies that we can't trust to not break our downstreams. I would be disappointed to see a VOTE thread. That means we failed to reach consensus and needed to fall back to process to resolve differences. That's fair. What about the wider audience issue on user@? There's no reason our DISCUSS threads couldn't go there as well. Why don't we do the doc update and call it a day? I've been burned by dependency changes in projects I rely on many times in the past, usually over changes in code sections that folks didn't think were likely to be used. So I'm very willing to do work now to save downstream users of HBase that same headache. -- Sean
[jira] [Created] (HBASE-13235) Revisit the security auditing semantics.
Srikanth Srungarapu created HBASE-13235: --- Summary: Revisit the security auditing semantics. Key: HBASE-13235 URL: https://issues.apache.org/jira/browse/HBASE-13235 Project: HBase Issue Type: Improvement Reporter: Srikanth Srungarapu Assignee: Srikanth Srungarapu More specifically, the following things need a closer look. (Will include more based on feedback and/or suggestions) * Table name (say test) instead of fully qualified table name(default:test) being used. * Right now, we're using the scope to be similar to arguments for operation. Would be better to decouple the arguments for operation and scope involved in checking. For e.g. say for createTable, we have the following audit log {code} Access denied for user esteban; reason: Insufficient permissions; remote address: /10.20.30.1; request: createTable; context: (user=srikanth@XXX, scope=default, action=CREATE) {code} The scope was rightly being used as default namespace, but we're missing out the information like operation params for CREATE which we used to log prior to HBASE-12511. Would love to hear inputs on this! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Rough goal timelines for 1.1 and 2.0
I think the last time this came up the answer to when? was * 1.1 in time for phoenix 5 * 2.0 later than 6 months and sooner than 6 years from 1.0 Can we discuss some goal post dates for these versions? I'd like to help Allen get HBASE-13231 (shell script update) done, and he's wondering how much he needs to prioritize it based on our expectations for when things come out. What do folks think of initial RC for 1.1 at the end of June and initial RC for 2.0 around November? Hopefully the 1.1 RC process will take ~1 month and the 2.0 ~2 months. That would have 2.0 come out just around a year after 1.0. -- Sean
Re: Rough goal timelines for 1.1 and 2.0
On Fri, Mar 13, 2015 at 12:31 PM, Andrew Purtell apurt...@apache.org wrote: Do we need to couple decisions for 1.1 and 2.0 in the same discussion? Like what? Interface changes for Phoenix maybe? -- Sean
[jira] [Resolved] (HBASE-13232) ConnectionManger : Batch pool threads and metaLookup pool threads should use different name pattern
[ https://issues.apache.org/jira/browse/HBASE-13232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John resolved HBASE-13232. Resolution: Fixed Hadoop Flags: Reviewed Thanks Nick. Pushed the trivial change to master and branch-1 ConnectionManger : Batch pool threads and metaLookup pool threads should use different name pattern --- Key: HBASE-13232 URL: https://issues.apache.org/jira/browse/HBASE-13232 Project: HBase Issue Type: Bug Reporter: Anoop Sam John Assignee: Anoop Sam John Priority: Trivial Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13232.patch This is a small issue happened with HBASE-13036. Passing different names to getThreadPool as nameHint but it is not been used. By checking HBASE-13219 found this small issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: hbase security issue
Nope, goes beyond a VPN. Securing a cluster can be a very painful task. On Mar 13, 2015, at 6:05 AM, Akmal Abbasov akmal.abba...@icloud.com wrote: Hi Wilm, My initial choice was to use VPN, but I couldn’t find any related information. Hi, putting the many good hints in this thread aside ... isn't this more a question of network deployment, than a question for hbase or hadoop features? I think a more general plan would be the usage of some VPN channel technology. By this plan a) the data is transferred secure b) the machines are not accessible by the normal internet. As database servers shouldn't be located in the normal internet, you already have to use some sort of protection against the rest of the world. VPN approach would work for distcp, exportSnapshot, cluster replication, or plain file system (for scp or such). Could you please explain a bit more in detail how achieve exportSnapshot to go through VPN. Thank you. And, I know that sounds like a joke, perhaps sending it by mail could be a good plan. E.g. 10TB in one day (postal service is fast here ;) ), this would be a throughput of around 115 MB per second and would be super safe (pgp encryption is a good plan nevertheless). Furthermore I think that's a question for hbase-user. Best wishes Wilm Am 12.03.2015 um 18:38 schrieb Akmal Abbasov: Hi, I am new to Hadoop Hbase. I have a Hbase cluster in one datacenter, and I need to create a backup in the second one. Currently the second HBase cluster is ready, and I would like to import data from first cluster. I would like to use exportSnapshot tool for this, I’ve tried it one my test environment, and it worked well. But, since know I am going to export to a different cluster in different datacenter, I would like to be sure that my data is secure. So how I can make exportSnapshot secure? As far as I understood exportSnapshot uses distcp tool to copy snapshot to destination cluster, so in this case is it enough to configure distcp? Thank you! The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com
[jira] [Resolved] (HBASE-13234) Improve the obviousness of the download link on hbase.apache.org
[ https://issues.apache.org/jira/browse/HBASE-13234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-13234. Resolution: Fixed Hadoop Flags: Reviewed Pushed site source change to master. Regenerated site using 'mvn site' and committed to SVN. Improve the obviousness of the download link on hbase.apache.org Key: HBASE-13234 URL: https://issues.apache.org/jira/browse/HBASE-13234 Project: HBase Issue Type: Task Components: documentation Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Minor Fix For: 2.0.0 Attachments: HBASE-13234.patch, screenshot.png Update the hbase.apache.org homepage to include a very obvious section describing how a user can Download HBase Software Here with a link. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13236) Clean up m2e-related warnings/errors from poms
Josh Elser created HBASE-13236: -- Summary: Clean up m2e-related warnings/errors from poms Key: HBASE-13236 URL: https://issues.apache.org/jira/browse/HBASE-13236 Project: HBase Issue Type: Improvement Components: build Reporter: Josh Elser Priority: Minor Fix For: 2.0.0, 1.1.0 Pulled down HBase, imported into Eclipse (either directly with m2eclipse or by running {{mvn eclipse:eclipse}} to generate the projects), and this results in a bunch of red due to executions/goals of certain plugins being unable to run in the context of eclipse. The lifecycle-mapping plugin can be used to get around these errors (and already exists in the pom). Add more mappings to the configuration so that a fresh import into Eclipse is not hindered by a bunch of false' errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13237) Improve trademark marks on the hbase.apache.org homepage
Andrew Purtell created HBASE-13237: -- Summary: Improve trademark marks on the hbase.apache.org homepage Key: HBASE-13237 URL: https://issues.apache.org/jira/browse/HBASE-13237 Project: HBase Issue Type: Task Components: documentation Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Minor Fix For: 2.0.0 Ensure trademark marks are next to first and prominent uses of HBase on the hbase.apache.org homepage -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13238) Time out locks and abort if HDFS is wedged
Andrew Purtell created HBASE-13238: -- Summary: Time out locks and abort if HDFS is wedged Key: HBASE-13238 URL: https://issues.apache.org/jira/browse/HBASE-13238 Project: HBase Issue Type: Brainstorming Reporter: Andrew Purtell This is a brainstorming issue on the top of timing out locks and aborting if HDFS is wedged. We had a minor production incident where a region was unable to close after 24 hours. The CloseRegionHandler was waiting for a write lock on the ReentrantReadWriteLock we take in HRegion#doClose. There were outstanding read locks. Three other threads were stuck in scanning, all blocked on the same DFSInputStream. Two were blocked in DFSInputStream#getFileLength, the third was waiting in epoll from SocketIOWithTimeout$SelectorPool#select with apparent infinite timeout from PacketReceiver#readChannelFully. This is similar to other issues we have seen before, in the context of the region wanting to finish a compaction, but can't due to some HDFS issue causing the reader to become extremely slow if not wedged. The Hadoop version was 2.3 (specifically 2.3 CDH 5.0.1), and we are planning to upgrade, but [~lhofhansl] and I were discussing the issue in general and wonder if we should not be timing out locks such as the ReentrantReadWriteLock, and if so, abort the regionserver. In this case this would have caused recovery and reassignment of the region in question and we would not have had a prolonged availability problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Status of Huawei's 2' Indexing?
When I made that remark I was thinking of a recent discussion we had at a joint Phoenix and HBase developer meetup. The difference of opinion was certainly civilized. (smile) I'm not aware of any specific written discussion, it may or may not exist. I'm pretty sure a revival of HBASE-9203 would attract some controversy, but let me be clearer this time than I was before that this is just my opinion, FWIW. On Thu, Mar 12, 2015 at 3:58 PM, Rose, Joseph joseph.r...@childrens.harvard.edu wrote: I saw that it was added to their project. I’m really not keen on bringing in all the RDBMS apparatus on top of hbase, so I decided to follow other avenues first (like trying to patch 0.98, for better or worse.) That Phoenix article seems like a good breakdown of the various indexing architectures. HBASE-9203 (the ticket that deals with 2’ indexes) is pretty civilized (as are most of them, it seems) so I didn’t know there were these differences of opinion. Did I miss the mailing list thread where the architectural differences were discussed? -j On 3/12/15, 5:22 PM, Andrew Purtell apurt...@apache.org wrote: There are some substantial architectural differences of opinion among the community on this feature as I understand it, so it's unlikely that JIRA will ever see a commit without a lot more work, if ever. A similar feature was later introduced into Apache Phoenix, which in this context may best be described as an extension package for HBase offering a suite of relational data management features. You may want to check out https://urldefense.proofpoint.com/v2/url?u=http-3A__phoenix.apache.org_sec ondary-5Findexing.htmld=BQIFaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppx eFUr=j9wyupjEn0B7jf5KuX71llCBNN37RKmLLRc05fkUwaA79i0DrYaVuQHxlqAccDLcm=f OIzciJo1NqWj26DsbT3F5JJSXcTZ-8F5-qoXv1gS4Es=HzKu9AHzTP_7lUDorigQonFbEeZYV 7G5POJyxXmjzWIe= and https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jir a_browse_PHOENIX-2D933d=BQIFaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppx eFUr=j9wyupjEn0B7jf5KuX71llCBNN37RKmLLRc05fkUwaA79i0DrYaVuQHxlqAccDLcm=f OIzciJo1NqWj26DsbT3F5JJSXcTZ-8F5-qoXv1gS4Es=pfw7Z6pPaL9QWzNXdlXceR4A2E9W3 LntcjXmNRIkgiAe= for background. On Thu, Mar 12, 2015 at 1:39 PM, Rose, Joseph joseph.r...@childrens.harvard.edu wrote: Hi, I’ve been looking over the Jira tickets for the secondary indexing mechanism Huawei had started to integrate back in 2013 (@see https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_ji ra_browse_HBASE-2D10222d=BQIFaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCop pxeFUr=j9wyupjEn0B7jf5KuX71llCBNN37RKmLLRc05fkUwaA79i0DrYaVuQHxlqAccDLc m=fOIzciJo1NqWj26DsbT3F5JJSXcTZ-8F5-qoXv1gS4Es=59-VsLDYkrqV2TQ4W13H-HMn2 qBjRcOPDRSuPdp2VjYe= ). The code was developed against 0.94 and it seems like a lot of work was done — but then it suddenly stops (the last update to HBASE-10222, the ticket for the work that actually adds secondary indexes, was a bit over a year ago. The last update for the load balancer work was from early last fall.) Is there work on this that I don’t see? I understand I can run this using Huawei’s code for 0.94 but I was hoping for a more recent hbase build. And I’ve tried applying the patches in HBASE-10222 (hope springs eternal); naturally there were some failures. I thought I’d ask here before trying to work through the failed hunks — and ask if you think that’s even a good idea in the first place. Thanks for your input! -j -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
[jira] [Created] (HBASE-13240) add an exemption to test-patch for build-only changes.
Sean Busbey created HBASE-13240: --- Summary: add an exemption to test-patch for build-only changes. Key: HBASE-13240 URL: https://issues.apache.org/jira/browse/HBASE-13240 Project: HBase Issue Type: Improvement Components: build Reporter: Sean Busbey Assignee: Sean Busbey Priority: Minor we've had a couple of patches lately that got pinged for no tests, but they were build-only changes. expand the exemption list from docs to also include changes just to build infra. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13241) Add tests for group level grants
Sean Busbey created HBASE-13241: --- Summary: Add tests for group level grants Key: HBASE-13241 URL: https://issues.apache.org/jira/browse/HBASE-13241 Project: HBase Issue Type: Improvement Components: security Reporter: Sean Busbey Priority: Critical We need to have tests for group-level grants for various scopes. ref: HBASE-13239 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13242) TestPerColumnFamilyFlush.testFlushingWhenLogRolling hung
zhangduo created HBASE-13242: Summary: TestPerColumnFamilyFlush.testFlushingWhenLogRolling hung Key: HBASE-13242 URL: https://issues.apache.org/jira/browse/HBASE-13242 Project: HBase Issue Type: Bug Components: test Affects Versions: 2.0.0, 1.1.0 Reporter: zhangduo Assignee: zhangduo -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13239) Hbase grants at specific column level does not work for Groups
Jaymin Patel created HBASE-13239: Summary: Hbase grants at specific column level does not work for Groups Key: HBASE-13239 URL: https://issues.apache.org/jira/browse/HBASE-13239 Project: HBase Issue Type: Bug Components: hbase Affects Versions: 0.98.4 Reporter: Jaymin Patel While performing Grant command to a specific column in a table - to a specific group does not produce needed results. However, when specific user is mentioned (instead of group name) in grant command, it becomes effective Steps to Reproduce : 1) using super-user, Grant a table/column family/column level grant to a group 2) login using a user ( part of the above group) and scan the table. It does not return any results 3) using super-user, Grant a table/column family/column level grant to a specific user ( instead of group) 4) login using that specific user and scan the table. It produces correct results, i.e. provides only the column where user has select privileges -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: hbase security issue
Hi Jerry, Hi, Vladimir Hope I understand your question correctly. If both local cluster and remote cluster are Kerberos enabled, ExportSnapshot from local to remote will work as long as both clusters' Kerberos have been set up in a way that they understand each other. Where I can find more information about this? If the remote cluster's httpfs/webhdfs port is protected by https security, after you set up the certificate on the client side, you will be able to talk to the remote port with SSL protection. Could you please provide more information about how I can securely transfer snapshot from one datacenter to another using technic you described. Thank you. You can also specify the remote target with a httpfs or webfdfs url, which then you can leverage SSL on the transport. What if remote cluster has security enabled? Will it work? -Vlad On Thu, Mar 12, 2015 at 1:39 PM, Jerry He jerry...@gmail.com wrote: ExportSnapshot does not use DistCp but directly use FileSystem API to copy, as Vladimir mentioned. But ExportSnapshot supports exporting to a remote target cluster. Give the full hdfs url. You can also specify the remote target with a httpfs or webfdfs url, which then you can leverage SSL on the transport. You also can copy to local cluster and use DistCp to copy to remote cluster. Jerry On Thu, Mar 12, 2015 at 12:28 PM, Vladimir Rodionov vladrodio...@gmail.com wrote: No, ExportSnapshot does not use DistCp it runs its own M/R job to copy data over to a new destination. In a map task it uses HDFS API to create/write data to a new destination. Therefore, the easiest way to secure communication during this operation is to use secure HDFS transport. http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-3-1/CDH4-Security-Guide/cdh4sg_topic_14_2.html but there is caveat ... ExportSnapshot does not support external cluster configuration - you can't provide path to external cluster config dir. This seems like a good feature request. -Vlad On Thu, Mar 12, 2015 at 10:38 AM, Akmal Abbasov akmal.abba...@icloud.com wrote: Hi, I am new to Hadoop Hbase. I have a Hbase cluster in one datacenter, and I need to create a backup in the second one. Currently the second HBase cluster is ready, and I would like to import data from first cluster. I would like to use exportSnapshot tool for this, I’ve tried it one my test environment, and it worked well. But, since know I am going to export to a different cluster in different datacenter, I would like to be sure that my data is secure. So how I can make exportSnapshot secure? As far as I understood exportSnapshot uses distcp tool to copy snapshot to destination cluster, so in this case is it enough to configure distcp? Thank you!
Re: hbase security issue
Hi Talat, I was considering replication, but decided to start with snapshots. Moreover, there are some drawbacks with replication, like propagation of user error, etc. Also I need a secure connection between data-centers, and I can't find information about this. On 13 Mar 2015, at 05:45, Talat Uyarer ta...@uyarer.com wrote: Hi Akmal, Why do not you use Cluster Replication ? [1] http://hbase.apache.org/book.html#_cluster_replication On Mar 12, 2015 11:40 PM, Vladimir Rodionov vladrodio...@gmail.com wrote: Thanks, Jerry. I think webdfs is preferable as since it is natively supported by hdfs (name node and data nodes) and traffic does not pass single gateway? Found this link how to set up webdfs over ssl: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.7/bk_Security_Guide/content/ch_wire-webhdfs-mr-yarn.html Cool. If works :). -Vlad On Thu, Mar 12, 2015 at 2:24 PM, Jerry He jerry...@gmail.com wrote: Hi, Vladimir Hope I understand your question correctly. If both local cluster and remote cluster are Kerberos enabled, ExportSnapshot from local to remote will work as long as both clusters' Kerberos have been set up in a way that they understand each other. If the remote cluster's httpfs/webhdfs port is protected by https security, after you set up the certificate on the client side, you will be able to talk to the remote port with SSL protection. Jerry On Thu, Mar 12, 2015 at 1:48 PM, Vladimir Rodionov vladrodio...@gmail.com wrote: You can also specify the remote target with a httpfs or webfdfs url, which then you can leverage SSL on the transport. What if remote cluster has security enabled? Will it work? -Vlad On Thu, Mar 12, 2015 at 1:39 PM, Jerry He jerry...@gmail.com wrote: ExportSnapshot does not use DistCp but directly use FileSystem API to copy, as Vladimir mentioned. But ExportSnapshot supports exporting to a remote target cluster. Give the full hdfs url. You can also specify the remote target with a httpfs or webfdfs url, which then you can leverage SSL on the transport. You also can copy to local cluster and use DistCp to copy to remote cluster. Jerry On Thu, Mar 12, 2015 at 12:28 PM, Vladimir Rodionov vladrodio...@gmail.com wrote: No, ExportSnapshot does not use DistCp it runs its own M/R job to copy data over to a new destination. In a map task it uses HDFS API to create/write data to a new destination. Therefore, the easiest way to secure communication during this operation is to use secure HDFS transport. http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-3-1/CDH4-Security-Guide/cdh4sg_topic_14_2.html but there is caveat ... ExportSnapshot does not support external cluster configuration - you can't provide path to external cluster config dir. This seems like a good feature request. -Vlad On Thu, Mar 12, 2015 at 10:38 AM, Akmal Abbasov akmal.abba...@icloud.com wrote: Hi, I am new to Hadoop Hbase. I have a Hbase cluster in one datacenter, and I need to create a backup in the second one. Currently the second HBase cluster is ready, and I would like to import data from first cluster. I would like to use exportSnapshot tool for this, I’ve tried it one my test environment, and it worked well. But, since know I am going to export to a different cluster in different datacenter, I would like to be sure that my data is secure. So how I can make exportSnapshot secure? As far as I understood exportSnapshot uses distcp tool to copy snapshot to destination cluster, so in this case is it enough to configure distcp? Thank you!
[jira] [Created] (HBASE-13226) Document enable_table_replication and disable_table_replication shell commands
Ashish Singhi created HBASE-13226: - Summary: Document enable_table_replication and disable_table_replication shell commands Key: HBASE-13226 URL: https://issues.apache.org/jira/browse/HBASE-13226 Project: HBase Issue Type: Sub-task Components: documentation Reporter: Ashish Singhi Assignee: Ashish Singhi Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: hbase security issue
Hi, putting the many good hints in this thread aside ... isn't this more a question of network deployment, than a question for hbase or hadoop features? I think a more general plan would be the usage of some VPN channel technology. By this plan a) the data is transferred secure b) the machines are not accessible by the normal internet. As database servers shouldn't be located in the normal internet, you already have to use some sort of protection against the rest of the world. VPN approach would work for distcp, exportSnapshot, cluster replication, or plain file system (for scp or such). And, I know that sounds like a joke, perhaps sending it by mail could be a good plan. E.g. 10TB in one day (postal service is fast here ;) ), this would be a throughput of around 115 MB per second and would be super safe (pgp encryption is a good plan nevertheless). Furthermore I think that's a question for hbase-user. Best wishes Wilm Am 12.03.2015 um 18:38 schrieb Akmal Abbasov: Hi, I am new to Hadoop Hbase. I have a Hbase cluster in one datacenter, and I need to create a backup in the second one. Currently the second HBase cluster is ready, and I would like to import data from first cluster. I would like to use exportSnapshot tool for this, I’ve tried it one my test environment, and it worked well. But, since know I am going to export to a different cluster in different datacenter, I would like to be sure that my data is secure. So how I can make exportSnapshot secure? As far as I understood exportSnapshot uses distcp tool to copy snapshot to destination cluster, so in this case is it enough to configure distcp? Thank you!
[jira] [Created] (HBASE-13229) Bug compatibility validation to start local-regionservers.sh and local-master-backup.sh
Gustavo Anatoly created HBASE-13229: --- Summary: Bug compatibility validation to start local-regionservers.sh and local-master-backup.sh Key: HBASE-13229 URL: https://issues.apache.org/jira/browse/HBASE-13229 Project: HBase Issue Type: Bug Components: scripts Reporter: Gustavo Anatoly Assignee: Gustavo Anatoly Priority: Minor Running the following line, using /bin/sh: $ bin/local-regionservers.sh --config ~/hbase-dev/hbase-conf/conf/ start 1 2 3 4 5 Produces the output below: bin/local-regionservers.sh: 55: bin/local-regionservers.sh: [[: not found Invalid argument bin/local-regionservers.sh: 55: bin/local-regionservers.sh: [[: not found Invalid argument bin/local-regionservers.sh: 55: bin/local-regionservers.sh: [[: not found Invalid argument bin/local-regionservers.sh: 55: bin/local-regionservers.sh: [[: not found Invalid argument bin/local-regionservers.sh: 55: bin/local-regionservers.sh: [[: not found Invalid argument Considering: {code} if [[ $i =~ ^[0-9]+$ ]]; then run_master $cmd $i else echo Invalid argument fi {code} The reasons is that the regex operator =~ doesn't have compatibility with /bin/sh but works running /bin/bash $ bash -x bin/local-regionservers.sh --config ~/hbase-dev/hbase-conf/conf/ start 1 2 3 4 5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: hbase security issue
Hi, Am 13.03.2015 um 12:05 schrieb Akmal Abbasov: My initial choice was to use VPN, but I couldn’t find any related information. well, I think that's because hbase or hadoop has nothing to do with vpn. It's completely independent. Just configure a VPN for the all machines in cluster A and cluster B (thus at least master A can access master B and vice versa*) and do the normal hbase stuff. E.g. cluster replication or exportSnapshot. It's not different than a setup for VPN and mysql, or cassandra etc. Best wishes, Wilm *) of course the vpn is much more complicate in a real world application, as the application server has to access the resp. masters, all machines in one cluster and the clients has to access the application servers etc..
[jira] [Created] (HBASE-13227) LoadIncrementalHFile should skip non-files inside a possible family-dir
Matteo Bertozzi created HBASE-13227: --- Summary: LoadIncrementalHFile should skip non-files inside a possible family-dir Key: HBASE-13227 URL: https://issues.apache.org/jira/browse/HBASE-13227 Project: HBase Issue Type: Bug Components: Client, mapreduce Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Attachments: HBASE-13227-v0.patch if we have random files/dirs inside the bulkload family dir, we should try to skip them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: hbase security issue
Hi Wilm, My initial choice was to use VPN, but I couldn’t find any related information. Hi, putting the many good hints in this thread aside ... isn't this more a question of network deployment, than a question for hbase or hadoop features? I think a more general plan would be the usage of some VPN channel technology. By this plan a) the data is transferred secure b) the machines are not accessible by the normal internet. As database servers shouldn't be located in the normal internet, you already have to use some sort of protection against the rest of the world. VPN approach would work for distcp, exportSnapshot, cluster replication, or plain file system (for scp or such). Could you please explain a bit more in detail how achieve exportSnapshot to go through VPN. Thank you. And, I know that sounds like a joke, perhaps sending it by mail could be a good plan. E.g. 10TB in one day (postal service is fast here ;) ), this would be a throughput of around 115 MB per second and would be super safe (pgp encryption is a good plan nevertheless). Furthermore I think that's a question for hbase-user. Best wishes Wilm Am 12.03.2015 um 18:38 schrieb Akmal Abbasov: Hi, I am new to Hadoop Hbase. I have a Hbase cluster in one datacenter, and I need to create a backup in the second one. Currently the second HBase cluster is ready, and I would like to import data from first cluster. I would like to use exportSnapshot tool for this, I’ve tried it one my test environment, and it worked well. But, since know I am going to export to a different cluster in different datacenter, I would like to be sure that my data is secure. So how I can make exportSnapshot secure? As far as I understood exportSnapshot uses distcp tool to copy snapshot to destination cluster, so in this case is it enough to configure distcp? Thank you!
[jira] [Created] (HBASE-13228) Create procedure v2 branch
Matteo Bertozzi created HBASE-13228: --- Summary: Create procedure v2 branch Key: HBASE-13228 URL: https://issues.apache.org/jira/browse/HBASE-13228 Project: HBase Issue Type: Sub-task Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi to develop Procedure V2 quickly, we are going to commit stuff to an hbase-12439 branch. In theory we can have QA running if the patch name is HBASE-xyz-hbase-12439.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Jira role cleanup
FYI, some time next week I'm going to try to simplify our jira role list. When I go to add new contributors it takes forever, I'm guessing because of the list size. I think I can trim it down some by making sure folks who are committers are in the committer list and not the contributor list. AFAICT, our jira is set to allow committers to do a proper superset of what contributors can do. Some folks are already listed in both. I'll send a summary email warning about what new powers folks have if there is anyone who's currently only in the contributor list. -- Sean
[jira] [Created] (HBASE-13230) [mob] reads hang when trying to read rows with large mobs (10MB)
Jonathan Hsieh created HBASE-13230: -- Summary: [mob] reads hang when trying to read rows with large mobs (10MB) Key: HBASE-13230 URL: https://issues.apache.org/jira/browse/HBASE-13230 Project: HBase Issue Type: Bug Components: mob Affects Versions: hbase-11339 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Fix For: hbase-11339 Using load tests tool to read and write out 5MB, 10MB, 20MB objects works fine, but problems are encountered when trying to read values 20MB. This is due to the default protobuf size limit of 64MB. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-12766) TestSplitLogManager#testGetPreviousRecoveryMode sometimes fails due to race condition
[ https://issues.apache.org/jira/browse/HBASE-12766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved HBASE-12766. Resolution: Duplicate Dupe of HBASE-13136 TestSplitLogManager#testGetPreviousRecoveryMode sometimes fails due to race condition - Key: HBASE-12766 URL: https://issues.apache.org/jira/browse/HBASE-12766 Project: HBase Issue Type: Test Reporter: Ted Yu Priority: Minor From https://builds.apache.org/job/HBase-1.0/614/testReport/junit/org.apache.hadoop.hbase.master/TestSplitLogManager/testGetPreviousRecoveryMode/ : {code} java.lang.AssertionError: Mode4=LOG_SPLITTING at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.hbase.master.TestSplitLogManager.testGetPreviousRecoveryMode(TestSplitLogManager.java:656) ... 2014-12-27 19:04:56,576 INFO [Thread-8] coordination.ZKSplitLogManagerCoordination(594): found orphan task testRecovery 2014-12-27 19:04:56,577 INFO [Thread-8] coordination.ZKSplitLogManagerCoordination(598): Found 1 orphan tasks and 0 rescan nodes 2014-12-27 19:04:56,578 DEBUG [main-EventThread] coordination.ZKSplitLogManagerCoordination(464): task not yet acquired /hbase/splitWAL/testRecovery ver = 0 2014-12-27 19:04:56,578 INFO [main-EventThread] coordination.ZKSplitLogManagerCoordination(548): creating orphan task /hbase/splitWAL/testRecovery 2014-12-27 19:04:56,578 INFO [main-EventThread] coordination.ZKSplitLogManagerCoordination(178): resubmitting unassigned orphan task /hbase/splitWAL/testRecovery 2014-12-27 19:04:56,578 INFO [main-EventThread] coordination.ZKSplitLogManagerCoordination(229): resubmitting task /hbase/splitWAL/testRecovery 2014-12-27 19:04:56,582 INFO [Thread-8] master.TestSplitLogManager(650): Mode1=LOG_SPLITTING 2014-12-27 19:04:56,584 DEBUG [main-EventThread] zookeeper.ZooKeeperWatcher(313): split-log-manager-tests58920b37-7850-44e5-8b97-871caff81fdb-0x14a8d231db7, quorum=localhost:60541, baseZNode=/hbase Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/hbase/splitWAL/testRecovery 2014-12-27 19:04:56,584 INFO [Thread-8] master.TestSplitLogManager(653): Mode2=LOG_SPLITTING 2014-12-27 19:04:56,584 DEBUG [Thread-8] coordination.ZKSplitLogManagerCoordination(870): Distributed log replay=true 2014-12-27 19:04:56,585 WARN [main-EventThread] coordination.ZKSplitLogManagerCoordination$GetDataAsyncCallback(996): task znode /hbase/splitWAL/testRecovery vanished or not created yet. 2014-12-27 19:04:56,585 INFO [main-EventThread] coordination.ZKSplitLogManagerCoordination(472): task /hbase/splitWAL/RESCAN01 entered state: DONE dummy-master,1,1 {code} From the above log we can see that by the time the following is called (line 654 in test): {code} slm.setRecoveryMode(false); {code} the split task was not done - it entered done state 1 millisecond later. So ZKSplitLogManagerCoordination#hasSplitLogTask was true and isForInitialization parameter is false, leading to the execution of the following branch: {code} } else if (!isForInitialization) { // splitlogtask hasn't drained yet, keep existing recovery mode return; {code} Thus recoveryMode was left in LOG_SPLITTING state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-13064) Fix flakey TestEnableTableHandler
[ https://issues.apache.org/jira/browse/HBASE-13064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Stepachev resolved HBASE-13064. -- Resolution: Duplicate Release Note: seems fixed by HBASE-13076 and other issues. Fix flakey TestEnableTableHandler - Key: HBASE-13064 URL: https://issues.apache.org/jira/browse/HBASE-13064 Project: HBase Issue Type: Bug Reporter: Andrey Stepachev Assignee: Andrey Stepachev -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-3743) Throttle major compaction
[ https://issues.apache.org/jira/browse/HBASE-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh resolved HBASE-3743. --- Resolution: Duplicate This is essentially a dupe of HBASE-8329 and HBASE-5867, both of which are closed out. Throttle major compaction - Key: HBASE-3743 URL: https://issues.apache.org/jira/browse/HBASE-3743 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Joep Rottinghuis Add the ability to throttle major compaction. For those use cases when a stop-the-world approach is not practical, it is useful to be able to throttle the impact that major compaction has on the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Jira role cleanup
On Fri, Mar 13, 2015 at 11:01 AM, Andrew Purtell apurt...@apache.org wrote: +1 I think it would be fine to trim the contributor list too. We can always add people back on demand in order to (re)assign issues. I wasn't sure how we generate the list of contributors. But then I noticed that we don't link to jira for it like I thought we did[1]. How about I make a saved jira query for people who have had jira's assigned to them, add a link to that query for our here are the contributors section, and then trim off from the role anyone who hasn't been assigned an issue in the last year? [1]: http://hbase.apache.org/team-list.html -- Sean
Re: [DISCUSS] Dependency compatibility
I'm -1 (non-binding) on weakening our compatibility promises. The more we can isolate our users from the impact of changes upstream the better. We can't though in general. Making compatibility promises we can't keep because our upstreams don't (see the dependencies section of Hadoop's compatibility guidelines) is ultimately an untenable position. *If* we had some complete dependency isolation for MapReduce and coprocessors committed then this could be a different conversation. Am I misstating this? In this specific instance we do have another option, so we could defer this to a later time when a really unavoidable dependency change happens... like a Guava update affecting HDFS. (We had one of those before.) We can document the Jackson classpath issue with Hadoop = 2.6 and provide remediation advice in the troubleshooting section of the manual. I would be disappointed to see a VOTE thread. That means we failed to reach consensus and needed to fall back to process to resolve differences. Why don't we do the doc update and call it a day? On Thu, Mar 12, 2015 at 7:32 PM, Sean Busbey bus...@cloudera.com wrote: On Thu, Mar 12, 2015 at 1:20 PM, Enis Söztutar enis@gmail.com wrote: This is good discussion, but I would like to reach a consensus and move on with HBASE-13149. My conclusion from the thread is that we cannot be realistically expected to keep dependency compat between minor versions because of lack of shading in HBase and Hadoop, and our dependencies are themselves not semver, and we cannot promise more than our dependencies promise. So I would like to formally drop support for dependency compat between minor versions as defined in https://hbase.apache.org/book.html#hbase.versioning. We can reintroduce later when we have better isolation/guarantees. In the mean time, we can move on. I'm -1 (non-binding) on weakening our compatibility promises. The more we can isolate our users from the impact of changes upstream the better. If our dependencies aren't semver it's all the more reason for us to be disciplined about 1) accepting them as exposed in the first place and 2) changing them. This is the first problem that has presented itself in the face of the restrictions we adopted as a community. I don't care for the precedent of us solving it by weakening those promises. For one thing, it reinforces messaging from vendors that folks need them to protect against choices individual projects make. If I have a solution that works to separate us from Hadoop when running on YARN in Hadoop 2.6+ before 1.1 is ready, can we keep our compat at the current strength? Some other deadline? AFAIK, we have no direct need for Jackson 1.9. The PMC has approved the compat guide, but I am not sure whether we need a VOTE thraed. What do you guys think? My main concern with not having a VOTE thread is that some PMC members might be more likely to pay attention to the matter if there's a VOTE, so a DISCUSS thread might only show consensus among a subset. Changes to the promises we make downstream are a big deal, so I'd prefer to err on the side of more participation. I'd also really like that VOTE thread to include user@hbase, since this impacts downstream users more than us directly. -- Sean -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: Jira role cleanup
+1 I think it would be fine to trim the contributor list too. We can always add people back on demand in order to (re)assign issues. On Fri, Mar 13, 2015 at 8:32 AM, Sean Busbey bus...@cloudera.com wrote: FYI, some time next week I'm going to try to simplify our jira role list. When I go to add new contributors it takes forever, I'm guessing because of the list size. I think I can trim it down some by making sure folks who are committers are in the committer list and not the contributor list. AFAICT, our jira is set to allow committers to do a proper superset of what contributors can do. Some folks are already listed in both. I'll send a summary email warning about what new powers folks have if there is anyone who's currently only in the contributor list. -- Sean -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: Jira role cleanup
How about I make a saved jira query for people who have had jira's assigned to them, add a link to that query for our here are the contributors section, and then trim off from the role anyone who hasn't been assigned an issue in the last year? That sounds like a very fair proposition. The JIRA role list isn't public I think, but just in case. Anyway, there's every reason to call out our contributors publicly and offer acknowledgement as thanks. On Fri, Mar 13, 2015 at 9:09 AM, Sean Busbey bus...@cloudera.com wrote: On Fri, Mar 13, 2015 at 11:01 AM, Andrew Purtell apurt...@apache.org wrote: +1 I think it would be fine to trim the contributor list too. We can always add people back on demand in order to (re)assign issues. I wasn't sure how we generate the list of contributors. But then I noticed that we don't link to jira for it like I thought we did[1]. How about I make a saved jira query for people who have had jira's assigned to them, add a link to that query for our here are the contributors section, and then trim off from the role anyone who hasn't been assigned an issue in the last year? [1]: http://hbase.apache.org/team-list.html -- Sean -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: [DISCUSS] Dependency compatibility
On Fri, Mar 13, 2015 at 11:18 AM, Andrew Purtell apurt...@apache.org wrote: I'm -1 (non-binding) on weakening our compatibility promises. The more we can isolate our users from the impact of changes upstream the better. We can't though in general. Making compatibility promises we can't keep because our upstreams don't (see the dependencies section of Hadoop's compatibility guidelines) is ultimately an untenable position. *If* we had some complete dependency isolation for MapReduce and coprocessors committed then this could be a different conversation. Am I misstating this? In this specific instance we do have another option, so we could defer this to a later time when a really unavoidable dependency change happens... like a Guava update affecting HDFS. (We had one of those before.) We can document the Jackson classpath issue with Hadoop = 2.6 and provide remediation advice in the troubleshooting section of the manual. I think we can solve this generally for Hadoop 2.6.0+. There's no reason our HDFS usage should be exposed in the HBase client code, and I think the application classpath feature for YARN in that version can isolate us on the MR side. I am willing to do this work in time for 1.1. Realistically I don't know the timeline for that version yet. If it turns out the work is more involved or my time is more constrained then I think, I'm willing to accept promise weakening as a practical matter. I'd be much more comfortable weakening our dependency promises for coprocessor than doing it in general. Folks running coprocessors should already be more risk tolerant and familiar with our internals. For upstreams that don't have the leverage on us of Hadoop, we solve this problem simply by not updating dependencies that we can't trust to not break our downstreams. I would be disappointed to see a VOTE thread. That means we failed to reach consensus and needed to fall back to process to resolve differences. That's fair. What about the wider audience issue on user@? There's no reason our DISCUSS threads couldn't go there as well. Why don't we do the doc update and call it a day? I've been burned by dependency changes in projects I rely on many times in the past, usually over changes in code sections that folks didn't think were likely to be used. So I'm very willing to do work now to save downstream users of HBase that same headache. -- Sean
[jira] [Created] (HBASE-13231) shell script rewrite
Allen Wittenauer created HBASE-13231: Summary: shell script rewrite Key: HBASE-13231 URL: https://issues.apache.org/jira/browse/HBASE-13231 Project: HBase Issue Type: New Feature Components: scripts, shell Affects Versions: 2.0.0 Reporter: Allen Wittenauer This JIRA is for updating the HBase shell code to something remotely modern. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSS] Dependency compatibility
There's no reason our HDFS usage should be exposed in the HBase client code I did look at this in the past, IIRC, our dependency was we use hadoop-common code to read our XML configuration files I would +1 a code duplication to remove the dependency. I also think it is important for the end user. On Fri, Mar 13, 2015 at 5:43 PM, Sean Busbey bus...@cloudera.com wrote: On Fri, Mar 13, 2015 at 11:18 AM, Andrew Purtell apurt...@apache.org wrote: I'm -1 (non-binding) on weakening our compatibility promises. The more we can isolate our users from the impact of changes upstream the better. We can't though in general. Making compatibility promises we can't keep because our upstreams don't (see the dependencies section of Hadoop's compatibility guidelines) is ultimately an untenable position. *If* we had some complete dependency isolation for MapReduce and coprocessors committed then this could be a different conversation. Am I misstating this? In this specific instance we do have another option, so we could defer this to a later time when a really unavoidable dependency change happens... like a Guava update affecting HDFS. (We had one of those before.) We can document the Jackson classpath issue with Hadoop = 2.6 and provide remediation advice in the troubleshooting section of the manual. I think we can solve this generally for Hadoop 2.6.0+. There's no reason our HDFS usage should be exposed in the HBase client code, and I think the application classpath feature for YARN in that version can isolate us on the MR side. I am willing to do this work in time for 1.1. Realistically I don't know the timeline for that version yet. If it turns out the work is more involved or my time is more constrained then I think, I'm willing to accept promise weakening as a practical matter. I'd be much more comfortable weakening our dependency promises for coprocessor than doing it in general. Folks running coprocessors should already be more risk tolerant and familiar with our internals. For upstreams that don't have the leverage on us of Hadoop, we solve this problem simply by not updating dependencies that we can't trust to not break our downstreams. I would be disappointed to see a VOTE thread. That means we failed to reach consensus and needed to fall back to process to resolve differences. That's fair. What about the wider audience issue on user@? There's no reason our DISCUSS threads couldn't go there as well. Why don't we do the doc update and call it a day? I've been burned by dependency changes in projects I rely on many times in the past, usually over changes in code sections that folks didn't think were likely to be used. So I'm very willing to do work now to save downstream users of HBase that same headache. -- Sean
Re: [DISCUSS] Dependency compatibility
There's no reason our HDFS usage should be exposed in the HBase client code, and I think the application classpath feature for YARN in that version can isolate us on the MR side. I was thinking more the case where we have to bump our version of Guava because our version and Hadoop's version are mutually incompatible and causing compilation failures or runtime failures or both. This was a thing once. Would it be possible to have different dependencies specified for client and server Maven projects? I suppose we could hack this, though it would be ugly. On Fri, Mar 13, 2015 at 9:43 AM, Sean Busbey bus...@cloudera.com wrote: On Fri, Mar 13, 2015 at 11:18 AM, Andrew Purtell apurt...@apache.org wrote: I'm -1 (non-binding) on weakening our compatibility promises. The more we can isolate our users from the impact of changes upstream the better. We can't though in general. Making compatibility promises we can't keep because our upstreams don't (see the dependencies section of Hadoop's compatibility guidelines) is ultimately an untenable position. *If* we had some complete dependency isolation for MapReduce and coprocessors committed then this could be a different conversation. Am I misstating this? In this specific instance we do have another option, so we could defer this to a later time when a really unavoidable dependency change happens... like a Guava update affecting HDFS. (We had one of those before.) We can document the Jackson classpath issue with Hadoop = 2.6 and provide remediation advice in the troubleshooting section of the manual. I think we can solve this generally for Hadoop 2.6.0+. There's no reason our HDFS usage should be exposed in the HBase client code, and I think the application classpath feature for YARN in that version can isolate us on the MR side. I am willing to do this work in time for 1.1. Realistically I don't know the timeline for that version yet. If it turns out the work is more involved or my time is more constrained then I think, I'm willing to accept promise weakening as a practical matter. I'd be much more comfortable weakening our dependency promises for coprocessor than doing it in general. Folks running coprocessors should already be more risk tolerant and familiar with our internals. For upstreams that don't have the leverage on us of Hadoop, we solve this problem simply by not updating dependencies that we can't trust to not break our downstreams. I would be disappointed to see a VOTE thread. That means we failed to reach consensus and needed to fall back to process to resolve differences. That's fair. What about the wider audience issue on user@? There's no reason our DISCUSS threads couldn't go there as well. Why don't we do the doc update and call it a day? I've been burned by dependency changes in projects I rely on many times in the past, usually over changes in code sections that folks didn't think were likely to be used. So I'm very willing to do work now to save downstream users of HBase that same headache. -- Sean -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
[jira] [Created] (HBASE-13232) ConnectionManger : Batch pool threads and metaLookup pool threads should use different name pattern
Anoop Sam John created HBASE-13232: -- Summary: ConnectionManger : Batch pool threads and metaLookup pool threads should use different name pattern Key: HBASE-13232 URL: https://issues.apache.org/jira/browse/HBASE-13232 Project: HBase Issue Type: Bug Reporter: Anoop Sam John Assignee: Anoop Sam John Priority: Trivial Fix For: 2.0.0, 1.1.0 This is a small issue happened with HBASE-13036. Passing different names to getThreadPool as nameHint but it is not been used. By checking HBASE-13219 found this small issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)