date:20150313

Andrew Purtell created HBASE-13234:
--

 Summary: Improve the obviousness of the download link on 
hbase.apache.org
 Key: HBASE-13234
 URL: https://issues.apache.org/jira/browse/HBASE-13234
 Project: HBase
  Issue Type: Task
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor


Update the hbase.apache.org homepage to include a very obvious section 
describing how a user can Download HBase Software Here with a link.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSS] Dependency compatibility

On Fri, Mar 13, 2015 at 11:59 AM, Andrew Purtell apurt...@apache.org
wrote:

  There's no reason our HDFS usage should be exposed in the HBase client
 code, and I think the application classpath feature for YARN in that
 version can isolate us on the MR side.

 I was thinking more the case where we have to bump our version of Guava
 because our version and Hadoop's version are mutually incompatible and
 causing compilation failures or runtime failures or both. This was a thing
 once. Would it be possible to have different dependencies specified for
 client and server Maven projects? I suppose we could hack this, though it
 would be ugly.



Yes, this is certainly doable in Maven. I think such a change would need to
come with documentation about and changes in assumptions in our
architecture, namely that the client side code only relate to the server
side code via RPC.

I'm not sure if this sounds more like a 2.0 thing or a 1.1 / 1.2 thing.

-- 
Sean

Re: Rough goal timelines for 1.1 and 2.0

Do we need to couple decisions for 1.1 and 2.0 in the same discussion?


On Fri, Mar 13, 2015 at 10:16 AM, Sean Busbey bus...@cloudera.com wrote:

 I think the last time this came up the answer to when? was

 * 1.1 in time for phoenix 5
 * 2.0 later than 6 months and sooner than 6 years from 1.0

 Can we discuss some goal post dates for these versions?

 I'd like to help Allen get HBASE-13231 (shell script update) done, and he's
 wondering how much he needs to prioritize it based on our expectations for
 when things come out.

 What do folks think of initial RC for 1.1 at the end of June and initial RC
 for 2.0 around November?

 Hopefully the 1.1 RC process will take ~1 month and the 2.0 ~2 months. That
 would have 2.0 come out just around a year after 1.0.

 --
 Sean




-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: hbase security issue

2015-03-13 Thread Michael Segel

You’re going to need to set up a trust in Kerberos. Either single way trust or
two way trust.
YMMV.

Its not as simple as just setting up an SSL although you can set up an SSL to
encrypt the traffic between clusters, however the clusters themselves are not
secured.

HTH

On Mar 12, 2015, at 4:39 PM, Vladimir Rodionov vladrodio...@gmail.com wrote:

Thanks, Jerry. I think webdfs is preferable as since it is natively
supported by hdfs (name node and data nodes) and traffic does not pass
single gateway?

Found this link how to set up webdfs over ssl:
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.7/bk_Security_Guide/content/ch_wire-webhdfs-mr-yarn.html

Cool. If works :).

-Vlad

On Thu, Mar 12, 2015 at 2:24 PM, Jerry He jerry...@gmail.com wrote:

Hi, Vladimir

Hope I understand your question correctly.
If both local cluster and remote cluster are Kerberos enabled,
ExportSnapshot from local to remote will work as long as both
clusters' Kerberos
have been set up in a way that they understand each other.
If the remote cluster's httpfs/webhdfs port is protected by https security,
after you set up the certificate on the client side, you will be able to
talk to the remote port with SSL protection.

Jerry

On Thu, Mar 12, 2015 at 1:48 PM, Vladimir Rodionov vladrodio...@gmail.com

wrote:

You can also specify the remote target with a httpfs or webfdfs url,
which
then you can leverage SSL on the transport.

What if remote cluster has security enabled? Will it work?

-Vlad

On Thu, Mar 12, 2015 at 1:39 PM, Jerry He jerry...@gmail.com wrote:

ExportSnapshot does not use DistCp but directly use FileSystem API to
copy,
as Vladimir mentioned.
But ExportSnapshot supports exporting to a remote target cluster. Give
the
full hdfs url.
You can also specify the remote target with a httpfs or webfdfs url,
which
then you can leverage SSL on the transport.

You also can copy to local cluster and use DistCp to copy to remote
cluster.

Jerry

On Thu, Mar 12, 2015 at 12:28 PM, Vladimir Rodionov
vladrodio...@gmail.com
wrote:

No, ExportSnapshot does not use DistCp it runs its own M/R job to
copy
data
over to a new destination.

In a map task it uses HDFS API to create/write data to a new
destination.
Therefore, the easiest way to secure communication
during this operation is to use secure HDFS transport.

http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-3-1/CDH4-Security-Guide/cdh4sg_topic_14_2.html

but there is caveat ...

ExportSnapshot does not support external cluster configuration - you
can't
provide path to external cluster config dir. This seems like a good
feature
request.

-Vlad

On Thu, Mar 12, 2015 at 10:38 AM, Akmal Abbasov
akmal.abba...@icloud.com
wrote:

Hi, I am new to Hadoop Hbase. I have a Hbase cluster in one
datacenter,
and I need to create a backup in the second one. Currently the
second
HBase cluster is ready, and I would like to import data from first
cluster.
I would like to use exportSnapshot tool for this, I’ve tried it one
my
test environment, and it worked well.
But, since know I am going to export to a different cluster in
different
datacenter, I would like to be sure that my data is secure. So how
I
can
make exportSnapshot secure?
As far as I understood exportSnapshot uses distcp tool to copy
snapshot
to
destination cluster, so in this case is it enough to configure
distcp?
Thank you!

The opinions expressed here are mine, while they may reflect a cognitive
thought, that is purely accidental.
Use at your own risk.
Michael Segel
michael_segel (AT) hotmail.com

Re: Rough goal timelines for 1.1 and 2.0

That was my question.. We can discuss them independently? Or is there a
reason not to?

On Fri, Mar 13, 2015 at 11:10 AM, Sean Busbey bus...@cloudera.com wrote:

 On Fri, Mar 13, 2015 at 12:31 PM, Andrew Purtell apurt...@apache.org
 wrote:

  Do we need to couple decisions for 1.1 and 2.0 in the same discussion?
 
 
 Like what? Interface changes for Phoenix maybe?

 --
 Sean




-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Rough goal timelines for 1.1 and 2.0

The only reason I can think of to make decisions now would be if we want to
ensure we have consensus for the changes for Phoenix and enough time to
implement them.

Given that AFAIK it's those changes that'll drive having a 1.1 release,
seems prudent. But I haven't been tracking the changes lately.

I think we're all in agreement that something needs to be done, and that
HBase 1.1 and Phoenix 5 are the places to do it. Probably it won't be
contentious to just decide as changes are ready?

-- 
Sean
On Mar 13, 2015 1:28 PM, Andrew Purtell apurt...@apache.org wrote:

 That was my question.. We can discuss them independently? Or is there a
 reason not to?

 On Fri, Mar 13, 2015 at 11:10 AM, Sean Busbey bus...@cloudera.com wrote:

  On Fri, Mar 13, 2015 at 12:31 PM, Andrew Purtell apurt...@apache.org
  wrote:
 
   Do we need to couple decisions for 1.1 and 2.0 in the same discussion?
  
  
  Like what? Interface changes for Phoenix maybe?
 
  --
  Sean
 



 --
 Best regards,

- Andy

 Problems worthy of attack prove their worth by hitting back. - Piet Hein
 (via Tom White)

Re: [DISCUSS] Dependency compatibility

2015-03-13 Thread Enis Söztutar



 I think we can solve this generally for Hadoop 2.6.0+. There's no reason
 our HDFS usage should be exposed in the HBase client code, and I think the
 application classpath feature for YARN in that version can isolate us on
 the MR side. I am willing to do this work in time for 1.1. Realistically I
 don't know the timeline for that version yet. If it turns out the work is
 more involved or my time is more constrained then I think, I'm willing to
 accept promise weakening as a practical matter.


HBase-1.x series SHOULD work with Hadoop versions as old as 2.2. That is
what we promise for 1.x series. So solving the problem in Hadoop-2.6+ will
not
solve it for 1.1.

It is great if we can help Hadoop with classloading issues, do shading
there, and
do shading in HBase and reduce the dependencies etc. However, since we
cannot
do these in HBase-1.x series (we cannot shade deps in the same manner), I
do
not see a way how to get around this by doing anything other than what I
propose.

We have discussed many of the compat dimensions before we adopted them in
PMC.
For some of those (like binary compat), we decided that we cannot support
them
between minor versions in 1.x , so we decided 'false on that dimension.
For these we
explicitly decided that when we can realistically have more guarantees, we
can start
supporting this dimension (client binary compat in minor versions) and have
2.x or
later support those.

I see the dependency compat dimension in the same vein. It is clear (at
least to me)
that we cannot support pragmatically any dep compat in 1.x series. If
you/we can
make all the necessary changes in Hadoop and HBase, we can reintroduce this
back. Until then though, I would rather not block any progress and drop the
support.
As you said the timeline is not clear, so why are we cornering ourselves
(especially
for 1.x series) for this?




 I'd be much more comfortable weakening our dependency promises for
 coprocessor than doing it in general. Folks running coprocessors should
 already be more risk tolerant and familiar with our internals.

 For upstreams that don't have the leverage on us of Hadoop, we solve this
 problem simply by not updating dependencies that we can't trust to not
 break our downstreams.



  I would be disappointed to see a VOTE thread. That means we failed to
 reach
  consensus and needed to fall back to process to resolve differences.
 
 

 That's fair. What about the wider audience issue on user@? There's no
 reason our DISCUSS threads couldn't go there as well.



  Why don't we do the doc update and call it a day?
 

 I've been burned by dependency changes in projects I rely on many times in
 the past, usually over changes in code sections that folks didn't think
 were likely to be used. So I'm very willing to do work now to save
 downstream users of HBase that same headache.

 --
 Sean

[jira] [Created] (HBASE-13235) Revisit the security auditing semantics.

2015-03-13 Thread Srikanth Srungarapu (JIRA)

Srikanth Srungarapu created HBASE-13235:
---

 Summary: Revisit the security auditing semantics.
 Key: HBASE-13235
 URL: https://issues.apache.org/jira/browse/HBASE-13235
 Project: HBase
  Issue Type: Improvement
Reporter: Srikanth Srungarapu
Assignee: Srikanth Srungarapu


More specifically, the following things need a closer look. (Will include more 
based on feedback and/or suggestions)
* Table name (say test) instead of fully qualified table name(default:test) 
being used.
* Right now, we're using the scope to be similar to arguments for operation. 
Would be better to decouple the arguments for operation and scope involved in 
checking. For e.g. say for createTable, we have the following audit log
{code}
Access denied for user esteban; reason: Insufficient permissions; remote 
address: /10.20.30.1; request: createTable; context: (user=srikanth@XXX, 
scope=default, action=CREATE)
{code}
The scope was rightly being used as default namespace, but we're missing out 
the information like operation params for CREATE which we used to log prior to 
HBASE-12511.

Would love to hear inputs on this!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Rough goal timelines for 1.1 and 2.0

I think the last time this came up the answer to when? was

* 1.1 in time for phoenix 5
* 2.0 later than 6 months and sooner than 6 years from 1.0

Can we discuss some goal post dates for these versions?

I'd like to help Allen get HBASE-13231 (shell script update) done, and he's
wondering how much he needs to prioritize it based on our expectations for
when things come out.

What do folks think of initial RC for 1.1 at the end of June and initial RC
for 2.0 around November?

Hopefully the 1.1 RC process will take ~1 month and the 2.0 ~2 months. That
would have 2.0 come out just around a year after 1.0.

-- 
Sean

Re: Rough goal timelines for 1.1 and 2.0

On Fri, Mar 13, 2015 at 12:31 PM, Andrew Purtell apurt...@apache.org
wrote:

 Do we need to couple decisions for 1.1 and 2.0 in the same discussion?


Like what? Interface changes for Phoenix maybe?

-- 
Sean

[jira] [Resolved] (HBASE-13232) ConnectionManger : Batch pool threads and metaLookup pool threads should use different name pattern

2015-03-13 Thread Anoop Sam John (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-13232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-13232.

  Resolution: Fixed
Hadoop Flags: Reviewed

Thanks Nick.  Pushed the trivial change to master and branch-1

 ConnectionManger : Batch pool threads and metaLookup pool threads should use 
 different name pattern
 ---

 Key: HBASE-13232
 URL: https://issues.apache.org/jira/browse/HBASE-13232
 Project: HBase
  Issue Type: Bug
Reporter: Anoop Sam John
Assignee: Anoop Sam John
Priority: Trivial
 Fix For: 2.0.0, 1.1.0

 Attachments: HBASE-13232.patch


 This is a small issue happened with HBASE-13036. Passing different names to 
 getThreadPool as nameHint but it is not been used. By checking HBASE-13219 
 found this small issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: hbase security issue

2015-03-13 Thread Michael Segel

Nope, goes beyond a VPN. 

Securing a cluster can be a very painful task. 


 On Mar 13, 2015, at 6:05 AM, Akmal Abbasov akmal.abba...@icloud.com wrote:
 
 Hi Wilm,
 My initial choice was to use VPN, but I couldn’t find any related information.
 
 Hi,
 
 putting the many good hints in this thread aside ... isn't this more a
 question of network deployment, than a question for hbase or hadoop
 features?
 
 I think a more general plan would be the usage of some VPN channel
 technology. By this plan
 a) the data is transferred secure
 b) the machines are not accessible by the normal internet.
 
 As database servers shouldn't be located in the normal internet, you
 already have to use some sort of protection against the rest of the
 world. VPN approach would work for distcp, exportSnapshot, cluster
 replication, or plain file system (for scp or such).
 Could you please explain a bit more in detail how achieve exportSnapshot to 
 go through VPN.
 Thank you.
 
 
 And, I know that sounds like a joke, perhaps sending it by mail could be
 a good plan. E.g. 10TB in one day (postal service is fast here ;) ),
 this would be a throughput of around 115 MB per second and would be
 super safe (pgp encryption is a good plan nevertheless).
 
 Furthermore I think that's a question for hbase-user.
 
 Best wishes
 
 Wilm
 
 Am 12.03.2015 um 18:38 schrieb Akmal Abbasov:
 Hi, I am new to Hadoop Hbase. I have a Hbase cluster in one datacenter, and 
 I need to  create a backup in the second one. Currently the second HBase 
 cluster is ready, and I would like to import data from first cluster.
 I would like to use exportSnapshot tool for this, I’ve tried it one my test 
 environment, and it worked well.
 But, since know I am going to export to a different cluster in different 
 datacenter, I would like to be sure that my data is secure. So how I can 
 make exportSnapshot secure?
 As far as I understood exportSnapshot uses distcp tool to copy snapshot to 
 destination cluster, so in this case is it enough to configure distcp?
 Thank you!

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

[jira] [Resolved] (HBASE-13234) Improve the obviousness of the download link on hbase.apache.org


 [ 
https://issues.apache.org/jira/browse/HBASE-13234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell resolved HBASE-13234.

  Resolution: Fixed
Hadoop Flags: Reviewed

Pushed site source change to master. Regenerated site using 'mvn site' and 
committed to SVN. 

 Improve the obviousness of the download link on hbase.apache.org
 

 Key: HBASE-13234
 URL: https://issues.apache.org/jira/browse/HBASE-13234
 Project: HBase
  Issue Type: Task
  Components: documentation
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor
 Fix For: 2.0.0

 Attachments: HBASE-13234.patch, screenshot.png


 Update the hbase.apache.org homepage to include a very obvious section 
 describing how a user can Download HBase Software Here with a link.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-13236) Clean up m2e-related warnings/errors from poms

2015-03-13 Thread Josh Elser (JIRA)

Josh Elser created HBASE-13236:
--

 Summary: Clean up m2e-related warnings/errors from poms
 Key: HBASE-13236
 URL: https://issues.apache.org/jira/browse/HBASE-13236
 Project: HBase
  Issue Type: Improvement
  Components: build
Reporter: Josh Elser
Priority: Minor
 Fix For: 2.0.0, 1.1.0


Pulled down HBase, imported into Eclipse (either directly with m2eclipse or by 
running {{mvn eclipse:eclipse}} to generate the projects), and this results in 
a bunch of red due to executions/goals of certain plugins being unable to run 
in the context of eclipse.

The lifecycle-mapping plugin can be used to get around these errors (and 
already exists in the pom).

Add more mappings to the configuration so that a fresh import into Eclipse is 
not hindered by a bunch of false' errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-13237) Improve trademark marks on the hbase.apache.org homepage

Andrew Purtell created HBASE-13237:
--

 Summary: Improve trademark marks on the hbase.apache.org homepage
 Key: HBASE-13237
 URL: https://issues.apache.org/jira/browse/HBASE-13237
 Project: HBase
  Issue Type: Task
  Components: documentation
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor
 Fix For: 2.0.0


Ensure trademark marks are next to first and prominent uses of HBase on the 
hbase.apache.org homepage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-13238) Time out locks and abort if HDFS is wedged

Andrew Purtell created HBASE-13238:
--

 Summary: Time out locks and abort if HDFS is wedged
 Key: HBASE-13238
 URL: https://issues.apache.org/jira/browse/HBASE-13238
 Project: HBase
  Issue Type: Brainstorming
Reporter: Andrew Purtell


This is a brainstorming issue on the top of timing out locks and aborting if 
HDFS is wedged.

We had a minor production incident where a region was unable to close after 24 
hours. The CloseRegionHandler was waiting for a write lock on the 
ReentrantReadWriteLock we take in HRegion#doClose. There were outstanding read 
locks. Three other threads were stuck in scanning, all blocked on the same 
DFSInputStream. Two were blocked in DFSInputStream#getFileLength, the third was 
waiting in epoll from SocketIOWithTimeout$SelectorPool#select with apparent 
infinite timeout from PacketReceiver#readChannelFully.

This is similar to other issues we have seen before, in the context of the 
region wanting to finish a compaction, but can't due to some HDFS issue causing 
the reader to become extremely slow if not wedged.

The Hadoop version was 2.3 (specifically 2.3 CDH 5.0.1), and we are planning to 
upgrade, but [~lhofhansl] and I were discussing the issue in general and wonder 
if we should not be timing out locks such as the ReentrantReadWriteLock, and if 
so, abort the regionserver. In this case this would have caused recovery and 
reassignment of the region in question and we would not have had a prolonged 
availability problem. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Status of Huawei's 2' Indexing?

When I made that remark I was thinking of a recent discussion we had at a
joint Phoenix and HBase developer meetup. The difference of opinion was
certainly civilized. (smile) I'm not aware of any specific written
discussion, it may or may not exist. I'm pretty sure a revival of HBASE-9203
would attract some controversy, but let me be clearer this time than I was
before that this is just my opinion, FWIW.


On Thu, Mar 12, 2015 at 3:58 PM, Rose, Joseph 
joseph.r...@childrens.harvard.edu wrote:

 I saw that it was added to their project. I’m really not keen on bringing
 in all the RDBMS apparatus on top of hbase, so I decided to follow other
 avenues first (like trying to patch 0.98, for better or worse.)

 That Phoenix article seems like a good breakdown of the various indexing
 architectures.

 HBASE-9203 (the ticket that deals with 2’ indexes) is pretty civilized (as
 are most of them, it seems) so I didn’t know there were these differences
 of opinion. Did I miss the mailing list thread where the architectural
 differences were discussed?


 -j


 On 3/12/15, 5:22 PM, Andrew Purtell apurt...@apache.org wrote:

 There are some substantial architectural differences of opinion among the
 community on this feature as I understand it, so it's unlikely that JIRA
 will ever see a commit without a lot more work, if ever.
 
 A similar feature was later introduced into Apache Phoenix, which in this
 context may best be described as an extension package for HBase offering a
 suite of relational data management features. You may want to check out
 
 https://urldefense.proofpoint.com/v2/url?u=http-3A__phoenix.apache.org_sec
 ondary-5Findexing.htmld=BQIFaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppx
 eFUr=j9wyupjEn0B7jf5KuX71llCBNN37RKmLLRc05fkUwaA79i0DrYaVuQHxlqAccDLcm=f
 OIzciJo1NqWj26DsbT3F5JJSXcTZ-8F5-qoXv1gS4Es=HzKu9AHzTP_7lUDorigQonFbEeZYV
 7G5POJyxXmjzWIe=  and
 
 https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jir
 a_browse_PHOENIX-2D933d=BQIFaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppx
 eFUr=j9wyupjEn0B7jf5KuX71llCBNN37RKmLLRc05fkUwaA79i0DrYaVuQHxlqAccDLcm=f
 OIzciJo1NqWj26DsbT3F5JJSXcTZ-8F5-qoXv1gS4Es=pfw7Z6pPaL9QWzNXdlXceR4A2E9W3
 LntcjXmNRIkgiAe=  for background.
 
 On Thu, Mar 12, 2015 at 1:39 PM, Rose, Joseph 
 joseph.r...@childrens.harvard.edu wrote:
 
  Hi,
 
  I’ve been looking over the Jira tickets for the secondary indexing
  mechanism Huawei had started to integrate back in 2013 (@see
 
 
 https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_ji
 ra_browse_HBASE-2D10222d=BQIFaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCop
 pxeFUr=j9wyupjEn0B7jf5KuX71llCBNN37RKmLLRc05fkUwaA79i0DrYaVuQHxlqAccDLc
 m=fOIzciJo1NqWj26DsbT3F5JJSXcTZ-8F5-qoXv1gS4Es=59-VsLDYkrqV2TQ4W13H-HMn2
 qBjRcOPDRSuPdp2VjYe=  ). The code was
  developed against 0.94 and it seems like a lot of work was done — but
 then
  it suddenly stops (the last update to HBASE-10222, the ticket for the
 work
  that actually adds secondary indexes, was a bit over a year ago. The
 last
  update for the load balancer work was from early last fall.)
 
  Is there work on this that I don’t see?
 
  I understand I can run this using Huawei’s code for 0.94 but I was
 hoping
  for a more recent hbase build. And I’ve tried applying the patches in
  HBASE-10222 (hope springs eternal); naturally there were some failures.
 I
  thought I’d ask here before trying to work through the failed hunks —
 and
  ask if you think that’s even a good idea in the first place.
 
  Thanks for your input!
 
 
  -j
 
 
 
 
 --
 Best regards,
 
- Andy
 
 Problems worthy of attack prove their worth by hitting back. - Piet Hein
 (via Tom White)




-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

[jira] [Created] (HBASE-13240) add an exemption to test-patch for build-only changes.

2015-03-13 Thread Sean Busbey (JIRA)

Sean Busbey created HBASE-13240:
---

 Summary: add an exemption to test-patch for build-only changes.
 Key: HBASE-13240
 URL: https://issues.apache.org/jira/browse/HBASE-13240
 Project: HBase
  Issue Type: Improvement
  Components: build
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Minor


we've had a couple of patches lately that got pinged for no tests, but they 
were build-only changes. expand the exemption list from docs to also include 
changes just to build infra.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-13241) Add tests for group level grants

2015-03-13 Thread Sean Busbey (JIRA)

Sean Busbey created HBASE-13241:
---

 Summary: Add tests for group level grants
 Key: HBASE-13241
 URL: https://issues.apache.org/jira/browse/HBASE-13241
 Project: HBase
  Issue Type: Improvement
  Components: security
Reporter: Sean Busbey
Priority: Critical


We need to have tests for group-level grants for various scopes. ref: 
HBASE-13239



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-13242) TestPerColumnFamilyFlush.testFlushingWhenLogRolling hung

2015-03-13 Thread zhangduo (JIRA)

zhangduo created HBASE-13242:


 Summary: TestPerColumnFamilyFlush.testFlushingWhenLogRolling hung
 Key: HBASE-13242
 URL: https://issues.apache.org/jira/browse/HBASE-13242
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0, 1.1.0
Reporter: zhangduo
Assignee: zhangduo






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-13239) Hbase grants at specific column level does not work for Groups

2015-03-13 Thread Jaymin Patel (JIRA)

Jaymin Patel created HBASE-13239:


 Summary:  Hbase grants at specific column level does not work for 
Groups 
 Key: HBASE-13239
 URL: https://issues.apache.org/jira/browse/HBASE-13239
 Project: HBase
  Issue Type: Bug
  Components: hbase
Affects Versions: 0.98.4
Reporter: Jaymin Patel


While performing Grant command to a specific column in a table - to a specific 
group does not produce needed results. However, when specific user is mentioned 
(instead of group name) in grant command, it becomes effective

Steps to Reproduce : 
1) using super-user, Grant a table/column family/column level grant to a group
2) login using a user ( part of the above group) and scan the table. It does 
not return any results

3) using super-user, Grant a table/column family/column level grant to a 
specific user ( instead of group) 
4) login using that specific user and scan the table. It produces correct 
results, i.e. provides only the column where user has select privileges



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: hbase security issue

2015-03-13 Thread Akmal Abbasov

Hi Jerry,

Hi, Vladimir

If the remote cluster's httpfs/webhdfs port is protected by https security,
after you set up the certificate on the client side, you will be able to
talk to the remote port with SSL protection.
Could you please provide more information about how I can securely transfer
snapshot from one datacenter to another using technic you described.
Thank you.

You can also specify the remote target with a httpfs or webfdfs url,
which
then you can leverage SSL on the transport.

What if remote cluster has security enabled? Will it work?

-Vlad

On Thu, Mar 12, 2015 at 1:39 PM, Jerry He jerry...@gmail.com wrote:

You also can copy to local cluster and use DistCp to copy to remote
cluster.

Jerry

On Thu, Mar 12, 2015 at 12:28 PM, Vladimir Rodionov
vladrodio...@gmail.com
wrote:

No, ExportSnapshot does not use DistCp it runs its own M/R job to copy
data
over to a new destination.

In a map task it uses HDFS API to create/write data to a new
destination.
Therefore, the easiest way to secure communication
during this operation is to use secure HDFS transport.
http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-3-1/CDH4-Security-Guide/cdh4sg_topic_14_2.html

but there is caveat ...

ExportSnapshot does not support external cluster configuration - you
can't
provide path to external cluster config dir. This seems like a good
feature
request.

-Vlad

On Thu, Mar 12, 2015 at 10:38 AM, Akmal Abbasov
akmal.abba...@icloud.com
wrote:

Hi, I am new to Hadoop Hbase. I have a Hbase cluster in one
datacenter,
and I need to create a backup in the second one. Currently the
second
HBase cluster is ready, and I would like to import data from first
cluster.
I would like to use exportSnapshot tool for this, I’ve tried it one
my
test environment, and it worked well.
But, since know I am going to export to a different cluster in
different
datacenter, I would like to be sure that my data is secure. So how I
can
make exportSnapshot secure?
As far as I understood exportSnapshot uses distcp tool to copy
snapshot
to
destination cluster, so in this case is it enough to configure
distcp?
Thank you!

Re: hbase security issue

2015-03-13 Thread Akmal Abbasov

Hi Talat,
I was considering replication, but decided to start with snapshots. Moreover,
there are some drawbacks with replication, like propagation of user error, etc.
Also I need a secure connection between data-centers, and I can't find
information about this.

On 13 Mar 2015, at 05:45, Talat Uyarer ta...@uyarer.com wrote:

Hi Akmal,

Why do not you use Cluster Replication ?

[1] http://hbase.apache.org/book.html#_cluster_replication
On Mar 12, 2015 11:40 PM, Vladimir Rodionov vladrodio...@gmail.com
wrote:

Thanks, Jerry. I think webdfs is preferable as since it is natively
supported by hdfs (name node and data nodes) and traffic does not pass
single gateway?

Found this link how to set up webdfs over ssl:

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.7/bk_Security_Guide/content/ch_wire-webhdfs-mr-yarn.html

Cool. If works :).

-Vlad

On Thu, Mar 12, 2015 at 2:24 PM, Jerry He jerry...@gmail.com wrote:

Hi, Vladimir

Hope I understand your question correctly.
If both local cluster and remote cluster are Kerberos enabled,
ExportSnapshot from local to remote will work as long as both
clusters' Kerberos
have been set up in a way that they understand each other.
If the remote cluster's httpfs/webhdfs port is protected by https
security,
after you set up the certificate on the client side, you will be able to
talk to the remote port with SSL protection.

Jerry

On Thu, Mar 12, 2015 at 1:48 PM, Vladimir Rodionov
vladrodio...@gmail.com
wrote:

You can also specify the remote target with a httpfs or webfdfs url,
which
then you can leverage SSL on the transport.

What if remote cluster has security enabled? Will it work?

-Vlad

On Thu, Mar 12, 2015 at 1:39 PM, Jerry He jerry...@gmail.com wrote:

ExportSnapshot does not use DistCp but directly use FileSystem API to
copy,
as Vladimir mentioned.
But ExportSnapshot supports exporting to a remote target cluster.
Give
the
full hdfs url.
You can also specify the remote target with a httpfs or webfdfs url,
which
then you can leverage SSL on the transport.

You also can copy to local cluster and use DistCp to copy to remote
cluster.

Jerry

On Thu, Mar 12, 2015 at 12:28 PM, Vladimir Rodionov
vladrodio...@gmail.com
wrote:

No, ExportSnapshot does not use DistCp it runs its own M/R job to
copy
data
over to a new destination.

but there is caveat ...

ExportSnapshot does not support external cluster configuration -
you
can't
provide path to external cluster config dir. This seems like a good
feature
request.

-Vlad

On Thu, Mar 12, 2015 at 10:38 AM, Akmal Abbasov
akmal.abba...@icloud.com
wrote:

Hi, I am new to Hadoop Hbase. I have a Hbase cluster in one
datacenter,
and I need to create a backup in the second one. Currently the
second
HBase cluster is ready, and I would like to import data from
first
cluster.
I would like to use exportSnapshot tool for this, I’ve tried it
one
my
test environment, and it worked well.
But, since know I am going to export to a different cluster in
different
datacenter, I would like to be sure that my data is secure. So
how
I
can
make exportSnapshot secure?
As far as I understood exportSnapshot uses distcp tool to copy
snapshot
to
destination cluster, so in this case is it enough to configure
distcp?
Thank you!

[jira] [Created] (HBASE-13226) Document enable_table_replication and disable_table_replication shell commands

2015-03-13 Thread Ashish Singhi (JIRA)

Ashish Singhi created HBASE-13226:
-

 Summary: Document enable_table_replication and 
disable_table_replication shell commands
 Key: HBASE-13226
 URL: https://issues.apache.org/jira/browse/HBASE-13226
 Project: HBase
  Issue Type: Sub-task
  Components: documentation
Reporter: Ashish Singhi
Assignee: Ashish Singhi
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: hbase security issue

2015-03-13 Thread Wilm Schumacher

Hi,

putting the many good hints in this thread aside ... isn't this more a
question of network deployment, than a question for hbase or hadoop
features?

I think a more general plan would be the usage of some VPN channel
technology. By this plan
a) the data is transferred secure
b) the machines are not accessible by the normal internet.

As database servers shouldn't be located in the normal internet, you
already have to use some sort of protection against the rest of the
world. VPN approach would work for distcp, exportSnapshot, cluster
replication, or plain file system (for scp or such).

And, I know that sounds like a joke, perhaps sending it by mail could be
a good plan. E.g. 10TB in one day (postal service is fast here ;) ),
this would be a throughput of around 115 MB per second and would be
super safe (pgp encryption is a good plan nevertheless).

Furthermore I think that's a question for hbase-user.

Best wishes

Wilm

Am 12.03.2015 um 18:38 schrieb Akmal Abbasov:
 Hi, I am new to Hadoop Hbase. I have a Hbase cluster in one datacenter, and I 
 need to  create a backup in the second one. Currently the second HBase 
 cluster is ready, and I would like to import data from first cluster.
 I would like to use exportSnapshot tool for this, I’ve tried it one my test 
 environment, and it worked well.
 But, since know I am going to export to a different cluster in different 
 datacenter, I would like to be sure that my data is secure. So how I can make 
 exportSnapshot secure?
 As far as I understood exportSnapshot uses distcp tool to copy snapshot to 
 destination cluster, so in this case is it enough to configure distcp?
 Thank you!

[jira] [Created] (HBASE-13229) Bug compatibility validation to start local-regionservers.sh and local-master-backup.sh

2015-03-13 Thread Gustavo Anatoly (JIRA)

Gustavo Anatoly created HBASE-13229:
---

 Summary: Bug compatibility validation to start 
local-regionservers.sh and local-master-backup.sh
 Key: HBASE-13229
 URL: https://issues.apache.org/jira/browse/HBASE-13229
 Project: HBase
  Issue Type: Bug
  Components: scripts
Reporter: Gustavo Anatoly
Assignee: Gustavo Anatoly
Priority: Minor


Running the following line, using /bin/sh:

$ bin/local-regionservers.sh --config ~/hbase-dev/hbase-conf/conf/ start 1 2 3 
4 5

Produces the output below:

bin/local-regionservers.sh: 55: bin/local-regionservers.sh: [[: not found
Invalid argument
bin/local-regionservers.sh: 55: bin/local-regionservers.sh: [[: not found
Invalid argument
bin/local-regionservers.sh: 55: bin/local-regionservers.sh: [[: not found
Invalid argument
bin/local-regionservers.sh: 55: bin/local-regionservers.sh: [[: not found
Invalid argument
bin/local-regionservers.sh: 55: bin/local-regionservers.sh: [[: not found
Invalid argument

Considering:
{code}
if [[ $i =~ ^[0-9]+$ ]]; then
   run_master $cmd $i
  else
   echo Invalid argument
fi
{code}
The reasons is that the regex operator =~ doesn't have compatibility with 
/bin/sh but works running /bin/bash 

$ bash -x bin/local-regionservers.sh --config ~/hbase-dev/hbase-conf/conf/ 
start 1 2 3 4 5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: hbase security issue

2015-03-13 Thread Wilm Schumacher

Hi,

Am 13.03.2015 um 12:05 schrieb Akmal Abbasov:
 My initial choice was to use VPN, but I couldn’t find any related information.

well, I think that's because hbase or hadoop has nothing to do with vpn.
It's completely independent. Just configure a VPN for the all machines
in cluster A and cluster B (thus at least master A can access master B
and vice versa*) and do the normal hbase stuff. E.g. cluster replication
or exportSnapshot.

It's not different than a setup for VPN and mysql, or cassandra etc.

Best wishes,

Wilm

*) of course the vpn is much more complicate in a real world
application, as the application server has to access the resp. masters,
all machines in one cluster and the clients has to access the
application servers etc..

[jira] [Created] (HBASE-13227) LoadIncrementalHFile should skip non-files inside a possible family-dir

2015-03-13 Thread Matteo Bertozzi (JIRA)

Matteo Bertozzi created HBASE-13227:
---

 Summary: LoadIncrementalHFile should skip non-files inside a 
possible family-dir
 Key: HBASE-13227
 URL: https://issues.apache.org/jira/browse/HBASE-13227
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Minor
 Attachments: HBASE-13227-v0.patch

if we have random files/dirs inside the bulkload family dir, we should try to 
skip them.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: hbase security issue

2015-03-13 Thread Akmal Abbasov

Hi Wilm,
My initial choice was to use VPN, but I couldn’t find any related information.

 Hi,
 
 putting the many good hints in this thread aside ... isn't this more a
 question of network deployment, than a question for hbase or hadoop
 features?
 
 I think a more general plan would be the usage of some VPN channel
 technology. By this plan
 a) the data is transferred secure
 b) the machines are not accessible by the normal internet.
 
 As database servers shouldn't be located in the normal internet, you
 already have to use some sort of protection against the rest of the
 world. VPN approach would work for distcp, exportSnapshot, cluster
 replication, or plain file system (for scp or such).
Could you please explain a bit more in detail how achieve exportSnapshot to go 
through VPN.
Thank you.

 
 And, I know that sounds like a joke, perhaps sending it by mail could be
 a good plan. E.g. 10TB in one day (postal service is fast here ;) ),
 this would be a throughput of around 115 MB per second and would be
 super safe (pgp encryption is a good plan nevertheless).
 
 Furthermore I think that's a question for hbase-user.
 
 Best wishes
 
 Wilm
 
 Am 12.03.2015 um 18:38 schrieb Akmal Abbasov:
 Hi, I am new to Hadoop Hbase. I have a Hbase cluster in one datacenter, and 
 I need to  create a backup in the second one. Currently the second HBase 
 cluster is ready, and I would like to import data from first cluster.
 I would like to use exportSnapshot tool for this, I’ve tried it one my test 
 environment, and it worked well.
 But, since know I am going to export to a different cluster in different 
 datacenter, I would like to be sure that my data is secure. So how I can 
 make exportSnapshot secure?
 As far as I understood exportSnapshot uses distcp tool to copy snapshot to 
 destination cluster, so in this case is it enough to configure distcp?
 Thank you!

[jira] [Created] (HBASE-13228) Create procedure v2 branch

2015-03-13 Thread Matteo Bertozzi (JIRA)

Matteo Bertozzi created HBASE-13228:
---

 Summary: Create procedure v2 branch
 Key: HBASE-13228
 URL: https://issues.apache.org/jira/browse/HBASE-13228
 Project: HBase
  Issue Type: Sub-task
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi


to develop Procedure V2 quickly, we are going to commit stuff to an hbase-12439 
branch.

In theory we can have QA running if the patch name is 
HBASE-xyz-hbase-12439.patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Jira role cleanup

FYI, some time next week I'm going to try to simplify our jira role list.
When I go to add new contributors it takes forever, I'm guessing because of
the list size.

I think I can trim it down some by making sure folks who are committers are
in the committer list and not the contributor list. AFAICT, our jira is set
to allow committers to do a proper superset of what contributors can do.

Some folks are already listed in both. I'll send a summary email warning
about what new powers folks have if there is anyone who's currently only in
the contributor list.

-- 
Sean

[jira] [Created] (HBASE-13230) [mob] reads hang when trying to read rows with large mobs (10MB)

2015-03-13 Thread Jonathan Hsieh (JIRA)

Jonathan Hsieh created HBASE-13230:
--

 Summary: [mob] reads hang when trying to read rows with large mobs 
(10MB)
 Key: HBASE-13230
 URL: https://issues.apache.org/jira/browse/HBASE-13230
 Project: HBase
  Issue Type: Bug
  Components: mob
Affects Versions: hbase-11339
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Fix For: hbase-11339


Using load tests tool to read and write out 5MB, 10MB, 20MB objects works fine, 
but problems are encountered when trying to read values 20MB.  This is due to 
the default protobuf size limit of 64MB.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-12766) TestSplitLogManager#testGetPreviousRecoveryMode sometimes fails due to race condition

2015-03-13 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-12766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-12766.

Resolution: Duplicate

Dupe of HBASE-13136

 TestSplitLogManager#testGetPreviousRecoveryMode sometimes fails due to race 
 condition
 -

 Key: HBASE-12766
 URL: https://issues.apache.org/jira/browse/HBASE-12766
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor

 From 
 https://builds.apache.org/job/HBase-1.0/614/testReport/junit/org.apache.hadoop.hbase.master/TestSplitLogManager/testGetPreviousRecoveryMode/
  :
 {code}
 java.lang.AssertionError: Mode4=LOG_SPLITTING
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at 
 org.apache.hadoop.hbase.master.TestSplitLogManager.testGetPreviousRecoveryMode(TestSplitLogManager.java:656)
 ...
 2014-12-27 19:04:56,576 INFO  [Thread-8] 
 coordination.ZKSplitLogManagerCoordination(594): found orphan task 
 testRecovery
 2014-12-27 19:04:56,577 INFO  [Thread-8] 
 coordination.ZKSplitLogManagerCoordination(598): Found 1 orphan tasks and 0 
 rescan nodes
 2014-12-27 19:04:56,578 DEBUG [main-EventThread] 
 coordination.ZKSplitLogManagerCoordination(464): task not yet acquired 
 /hbase/splitWAL/testRecovery ver = 0
 2014-12-27 19:04:56,578 INFO  [main-EventThread] 
 coordination.ZKSplitLogManagerCoordination(548): creating orphan task 
 /hbase/splitWAL/testRecovery
 2014-12-27 19:04:56,578 INFO  [main-EventThread] 
 coordination.ZKSplitLogManagerCoordination(178): resubmitting unassigned 
 orphan task /hbase/splitWAL/testRecovery
 2014-12-27 19:04:56,578 INFO  [main-EventThread] 
 coordination.ZKSplitLogManagerCoordination(229): resubmitting task 
 /hbase/splitWAL/testRecovery
 2014-12-27 19:04:56,582 INFO  [Thread-8] master.TestSplitLogManager(650): 
 Mode1=LOG_SPLITTING
 2014-12-27 19:04:56,584 DEBUG [main-EventThread] 
 zookeeper.ZooKeeperWatcher(313): 
 split-log-manager-tests58920b37-7850-44e5-8b97-871caff81fdb-0x14a8d231db7,
  quorum=localhost:60541, baseZNode=/hbase Received ZooKeeper Event, 
 type=NodeDataChanged, state=SyncConnected, path=/hbase/splitWAL/testRecovery
 2014-12-27 19:04:56,584 INFO  [Thread-8] master.TestSplitLogManager(653): 
 Mode2=LOG_SPLITTING
 2014-12-27 19:04:56,584 DEBUG [Thread-8] 
 coordination.ZKSplitLogManagerCoordination(870): Distributed log replay=true
 2014-12-27 19:04:56,585 WARN  [main-EventThread] 
 coordination.ZKSplitLogManagerCoordination$GetDataAsyncCallback(996): task 
 znode /hbase/splitWAL/testRecovery vanished or not created yet.
 2014-12-27 19:04:56,585 INFO  [main-EventThread] 
 coordination.ZKSplitLogManagerCoordination(472): task 
 /hbase/splitWAL/RESCAN01 entered state: DONE dummy-master,1,1
 {code}
 From the above log we can see that by the time the following is called (line 
 654 in test):
 {code}
 slm.setRecoveryMode(false);
 {code}
 the split task was not done - it entered done state 1 millisecond later.
 So ZKSplitLogManagerCoordination#hasSplitLogTask was true and 
 isForInitialization parameter is false, leading to the execution of the 
 following branch:
 {code}
   } else if (!isForInitialization) {
 // splitlogtask hasn't drained yet, keep existing recovery mode
 return;
 {code}
 Thus recoveryMode was left in LOG_SPLITTING state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-13064) Fix flakey TestEnableTableHandler

2015-03-13 Thread Andrey Stepachev (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-13064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Stepachev resolved HBASE-13064.
--
  Resolution: Duplicate
Release Note: seems fixed by HBASE-13076 and other issues.

 Fix flakey TestEnableTableHandler
 -

 Key: HBASE-13064
 URL: https://issues.apache.org/jira/browse/HBASE-13064
 Project: HBase
  Issue Type: Bug
Reporter: Andrey Stepachev
Assignee: Andrey Stepachev





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-3743) Throttle major compaction

2015-03-13 Thread Jonathan Hsieh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh resolved HBASE-3743.
---
Resolution: Duplicate

This is essentially a dupe of HBASE-8329 and HBASE-5867, both of which are 
closed out.

 Throttle major compaction
 -

 Key: HBASE-3743
 URL: https://issues.apache.org/jira/browse/HBASE-3743
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Joep Rottinghuis

 Add the ability to throttle major compaction.
 For those use cases when a stop-the-world approach is not practical, it is 
 useful to be able to throttle the impact that major compaction has on the 
 cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Jira role cleanup

On Fri, Mar 13, 2015 at 11:01 AM, Andrew Purtell apurt...@apache.org
wrote:

 +1
 I think it would be fine to trim the contributor list too. We can always
 add people back on demand in order to (re)assign issues.


I wasn't sure how we generate the list of contributors. But then I noticed
that we don't link to jira for it like I thought we did[1].

How about I make a saved jira query for people who have had jira's assigned
to them, add a link to that query for our here are the contributors
section, and then trim off from the role anyone who hasn't been assigned an
issue in the last year?


[1]: http://hbase.apache.org/team-list.html

-- 
Sean

Re: [DISCUSS] Dependency compatibility

 I'm -1 (non-binding) on weakening our compatibility promises. The more we can
isolate our users from the impact of changes upstream the better.

We can't though in general. Making compatibility promises we can't keep
because our upstreams don't (see the dependencies section of Hadoop's
compatibility guidelines) is ultimately an untenable position. *If* we had
some complete dependency isolation for MapReduce and coprocessors committed
then this could be a different conversation. Am I misstating this?

In this specific instance we do have another option, so we could defer
this to a later time when a really unavoidable dependency change happens...
like a Guava update affecting HDFS. (We had one of those before.) We can
document the Jackson classpath issue with Hadoop = 2.6 and provide
remediation advice in the troubleshooting section of the manual.

I would be disappointed to see a VOTE thread. That means we failed to reach
consensus and needed to fall back to process to resolve differences.

Why don't we do the doc update and call it a day?


On Thu, Mar 12, 2015 at 7:32 PM, Sean Busbey bus...@cloudera.com wrote:

 On Thu, Mar 12, 2015 at 1:20 PM, Enis Söztutar enis@gmail.com wrote:

  This is good discussion, but I would like to reach a consensus and move
 on
  with HBASE-13149.
 
  My conclusion from the thread is that we cannot be realistically expected
  to keep dependency compat between minor versions because of lack of
 shading
  in HBase and Hadoop, and our dependencies are themselves not semver, and
 we
  cannot promise more than our dependencies promise.
 
  So I would like to formally drop support for dependency compat between
  minor versions as defined in
  https://hbase.apache.org/book.html#hbase.versioning. We can reintroduce
  later when we have better isolation/guarantees. In the mean time, we can
  move on.
 
 
 I'm -1 (non-binding) on weakening our compatibility promises. The more we
 can isolate our users from the impact of changes upstream the better. If
 our dependencies aren't semver it's all the more reason for us to be
 disciplined about 1) accepting them as exposed in the first place and 2)
 changing them.

 This is the first problem that has presented itself in the face of the
 restrictions we adopted as a community. I don't care for the precedent of
 us solving it by weakening those promises. For one thing, it reinforces
 messaging from vendors that folks need them to protect against choices
 individual projects make.

 If I have a solution that works to separate us from Hadoop when running on
 YARN in Hadoop 2.6+ before 1.1 is ready, can we keep our compat at the
 current strength? Some other deadline? AFAIK, we have no direct need for
 Jackson 1.9.



  The PMC has approved the compat guide, but I am not sure whether we need
 a
  VOTE thraed. What do you guys think?
 
 
 My main concern with not having a VOTE thread is that some PMC members
 might be more likely to pay attention to the matter if there's a VOTE, so a
 DISCUSS thread might only show consensus among a subset.

 Changes to the promises we make downstream are a big deal, so I'd prefer to
 err on the side of more participation. I'd also really like that VOTE
 thread to include user@hbase, since this impacts downstream users more
 than
 us directly.

 --
 Sean




-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Jira role cleanup

+1
I think it would be fine to trim the contributor list too. We can always
add people back on demand in order to (re)assign issues.

On Fri, Mar 13, 2015 at 8:32 AM, Sean Busbey bus...@cloudera.com wrote:

 FYI, some time next week I'm going to try to simplify our jira role list.
 When I go to add new contributors it takes forever, I'm guessing because of
 the list size.

 I think I can trim it down some by making sure folks who are committers are
 in the committer list and not the contributor list. AFAICT, our jira is set
 to allow committers to do a proper superset of what contributors can do.

 Some folks are already listed in both. I'll send a summary email warning
 about what new powers folks have if there is anyone who's currently only in
 the contributor list.

 --
 Sean




-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Jira role cleanup



How about I make a saved jira query for people who have had jira's assigned
 
to them, add a link to that query for our here are the contributors
 
section, and then trim off from the role anyone who hasn't been assigned an
 
issue in the last year?

That sounds like a very fair proposition. The JIRA role list isn't public I
think, but just in case. Anyway, there's every reason to call out our
contributors publicly and offer acknowledgement as thanks.


On Fri, Mar 13, 2015 at 9:09 AM, Sean Busbey bus...@cloudera.com wrote:

 On Fri, Mar 13, 2015 at 11:01 AM, Andrew Purtell apurt...@apache.org
 wrote:

  +1
  I think it would be fine to trim the contributor list too. We can always
  add people back on demand in order to (re)assign issues.
 
 
 I wasn't sure how we generate the list of contributors. But then I noticed
 that we don't link to jira for it like I thought we did[1].

 
 How about I make a saved jira query for people who have had jira's assigned
 to them, add a link to that query for our here are the contributors
 section, and then trim off from the role anyone who hasn't been assigned an
 issue in the last year?


 [1]: http://hbase.apache.org/team-list.html

 --
 Sean




-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: [DISCUSS] Dependency compatibility

On Fri, Mar 13, 2015 at 11:18 AM, Andrew Purtell apurt...@apache.org
wrote:

  I'm -1 (non-binding) on weakening our compatibility promises. The more
 we can
 isolate our users from the impact of changes upstream the better.

 We can't though in general. Making compatibility promises we can't keep
 because our upstreams don't (see the dependencies section of Hadoop's
 compatibility guidelines) is ultimately an untenable position. *If* we had
 some complete dependency isolation for MapReduce and coprocessors committed
 then this could be a different conversation. Am I misstating this?



 In this specific instance we do have another option, so we could defer
 this to a later time when a really unavoidable dependency change happens...
 like a Guava update affecting HDFS. (We had one of those before.) We can
 document the Jackson classpath issue with Hadoop = 2.6 and provide
 remediation advice in the troubleshooting section of the manual.


I think we can solve this generally for Hadoop 2.6.0+. There's no reason
our HDFS usage should be exposed in the HBase client code, and I think the
application classpath feature for YARN in that version can isolate us on
the MR side. I am willing to do this work in time for 1.1. Realistically I
don't know the timeline for that version yet. If it turns out the work is
more involved or my time is more constrained then I think, I'm willing to
accept promise weakening as a practical matter.

I'd be much more comfortable weakening our dependency promises for
coprocessor than doing it in general. Folks running coprocessors should
already be more risk tolerant and familiar with our internals.

For upstreams that don't have the leverage on us of Hadoop, we solve this
problem simply by not updating dependencies that we can't trust to not
break our downstreams.



 I would be disappointed to see a VOTE thread. That means we failed to reach
 consensus and needed to fall back to process to resolve differences.



That's fair. What about the wider audience issue on user@? There's no
reason our DISCUSS threads couldn't go there as well.



 Why don't we do the doc update and call it a day?


I've been burned by dependency changes in projects I rely on many times in
the past, usually over changes in code sections that folks didn't think
were likely to be used. So I'm very willing to do work now to save
downstream users of HBase that same headache.

-- 
Sean

[jira] [Created] (HBASE-13231) shell script rewrite

2015-03-13 Thread Allen Wittenauer (JIRA)

Allen Wittenauer created HBASE-13231:


 Summary: shell script rewrite
 Key: HBASE-13231
 URL: https://issues.apache.org/jira/browse/HBASE-13231
 Project: HBase
  Issue Type: New Feature
  Components: scripts, shell
Affects Versions: 2.0.0
Reporter: Allen Wittenauer


This JIRA is for updating the HBase shell code to something remotely modern. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSS] Dependency compatibility

2015-03-13 Thread Nicolas Liochon

There's no reason our HDFS usage should be exposed in the HBase client code
I did look at this in the past, IIRC, our dependency was we use
hadoop-common code to read our XML configuration files
I would +1 a code duplication to remove the dependency.

I also think it is important for the end user.


On Fri, Mar 13, 2015 at 5:43 PM, Sean Busbey bus...@cloudera.com wrote:

 On Fri, Mar 13, 2015 at 11:18 AM, Andrew Purtell apurt...@apache.org
 wrote:

   I'm -1 (non-binding) on weakening our compatibility promises. The more
  we can
  isolate our users from the impact of changes upstream the better.
 
  We can't though in general. Making compatibility promises we can't keep
  because our upstreams don't (see the dependencies section of Hadoop's
  compatibility guidelines) is ultimately an untenable position. *If* we
 had
  some complete dependency isolation for MapReduce and coprocessors
 committed
  then this could be a different conversation. Am I misstating this?
 


  In this specific instance we do have another option, so we could defer
  this to a later time when a really unavoidable dependency change
 happens...
  like a Guava update affecting HDFS. (We had one of those before.) We can
  document the Jackson classpath issue with Hadoop = 2.6 and provide
  remediation advice in the troubleshooting section of the manual.
 
 
 I think we can solve this generally for Hadoop 2.6.0+. There's no reason
 our HDFS usage should be exposed in the HBase client code, and I think the
 application classpath feature for YARN in that version can isolate us on
 the MR side. I am willing to do this work in time for 1.1. Realistically I
 don't know the timeline for that version yet. If it turns out the work is
 more involved or my time is more constrained then I think, I'm willing to
 accept promise weakening as a practical matter.

 I'd be much more comfortable weakening our dependency promises for
 coprocessor than doing it in general. Folks running coprocessors should
 already be more risk tolerant and familiar with our internals.

 For upstreams that don't have the leverage on us of Hadoop, we solve this
 problem simply by not updating dependencies that we can't trust to not
 break our downstreams.



  I would be disappointed to see a VOTE thread. That means we failed to
 reach
  consensus and needed to fall back to process to resolve differences.
 
 

 That's fair. What about the wider audience issue on user@? There's no
 reason our DISCUSS threads couldn't go there as well.



  Why don't we do the doc update and call it a day?
 

 I've been burned by dependency changes in projects I rely on many times in
 the past, usually over changes in code sections that folks didn't think
 were likely to be used. So I'm very willing to do work now to save
 downstream users of HBase that same headache.

 --
 Sean

Re: [DISCUSS] Dependency compatibility