Re: HBase 0.94 - TableMapReduceUtil class, overloaded method "initTableMapperJob" behaves differently

2014-04-07 Thread Nick Dimiduk
I vaguely recall a ticket being raised and committed for this problem, but
the bug is clearly present and I'm not finding the issue. Maybe @larsh
recalls?


On Mon, Apr 7, 2014 at 2:48 PM, Ted Yu  wrote:

> You are talking about this method, right ?
>
>   public static void initTableMapperJob(List scans,
>   Class mapper,
>   Class outputKeyClass,
>   Class outputValueClass, Job job,
>   boolean addDependencyJars) throws IOException {
>
> Looks like it should call initCredentials(job) as well.
>
>
> On Mon, Apr 7, 2014 at 2:37 PM, Jim Huang  wrote:
>
> > I am looking at the 0.94 branch and I have noticed that the overloaded
> > methods for "initTableMapperJob" behaves differently in that one of them
> > calls the "initCredentials(job);" while the others don't.  For people who
> > are using Kerberos security on their cluster, the initCredentials(job)
> sets
> > up the HBase security tokens for all the mapper tasks only for that
> > overloaded method.  Is there a specific reason why this could be
> > intentional?  Otherwise, I would like to create a new JIRA to see if I
> can
> > work on this as a newbie issue.
> >
> > Thanks for any pointer.
> > Jim
> >
>


Re: blockcache 101

2014-04-09 Thread Nick Dimiduk
Stack:

Did you take measure of average/mean response times doing your blockcache
> comparison?


Yes, in total I also collected mean 50%, 95%, 99%, and 99.9% latency
values. I only performed the analysis over the 99% in the post. I looked
briefly also at the 99.9% but that wasn't immediately relevant to the
context of the experiment. All of these data are included in the "raw
results" csv I uploaded and linked from the "Showdown" post.

do you need more proof bucketcache subsumes slabcache?


I'd like more vetting, yes. As you alluded to in the previous question, a
more holistic view of response times would be good, and also I'd like to
see how they perform with a mixed workload. Next step is probably to
exercise them with some YSCB workloads of varying RAM:DB ratios.

Todd:

the trend lines drawn on the graphs seem to be based on some assumption
> that there is an exponential scaling pattern.


Which charts are you specifically referring to? Indeed, the trend lines
were generated rather casually with Excel and may be misleading. Perhaps a
more responsible representation would be to simply connect each data point
with a line to aid visibility.

In practice I would think it would be sigmoid [...] As soon as it starts to
> be larger than the cache capacity [...] as the dataset gets larger, the
> latency will level out as a flat line, not continue to grow as your trend
> lines are showing.


When decoupling cache size from database size, you're presumably correct. I
believe that's what's shown in the figures in perfeval_blockcache_v1.pdf,
especially as total memory increases. The plateau effect is suggested in
the 20G and 50G charts in that book. This is why I included the second set
of charts in perfeval_blockcache_v2.pdf. The intention is to couple the
cache size to dataset size and demonstrate how an implementation performs
as the absolute values increase. That is, assuming hit,eviction rate remain
roughly constant, how well does an implementation "scale up" to a larger
memory footprint.

-n


Disambiguate cell time for 1.0

2014-04-10 Thread Nick Dimiduk
Reading through the write path, it seems to me that
RSRpcServices#doBatchOp(RegionActionResult.Builder,HRegion,List
mutations,CellScanner) should be honoring a nonce if present. The reason
being: if a client sends some Puts without specifying a TS, it relies on
the RS to provide one. Should such an operation succeed on the server but
the ACK not reach the client, client may resend the operation, silently
inserting more cells than intended. Deletes may well be a more sinister
issue, removing more cells than intended.

I've not yet written a test to confirm this.

There was conversation around the implementation of nonces discussing
options for removing the coupling of TS to clock-on-the-wall time. Sergey
describes the current situation quite eloquently: "server-generated TS
provide illusion of consistency guarantees which is not there by any
means". A fix for this will likely subtly break the semantics of our data
coordinates, and so should be addressed with 1.0, perhaps along side a
revamped client-side API.

-n


Re: Disambiguate cell time for 1.0

2014-04-11 Thread Nick Dimiduk
Yes, 10247 is precisely where we should have this discussion. Thanks for
pointing it out.


On Thu, Apr 10, 2014 at 8:22 PM, lars hofhansl  wrote:

> Should we discuss this here:
> https://issues.apache.org/jira/browse/HBASE-10247?
>
>
>
> 
>  From: Sergey Shelukhin 
> To: dev@hbase.apache.org
> Sent: Thursday, April 10, 2014 6:16 PM
> Subject: Re: Disambiguate cell time for 1.0
>
>
> Adding nonces to deletes and puts is possible, but it is overhead...
>
>
>
> On Thu, Apr 10, 2014 at 6:16 PM, Sergey Shelukhin  >wrote:
>
> > Just to clarify what I was saying, there was a JIRA where we were
> > discussing this.
> > Server clocks are unreliable and when region moves (or if we have
> writable
> > replicas) between-server clocks are even more sor.
> > So we cannot really promise consistency wrt order even for requests from
> > the same client now; even if we manage TS 100%.
> > My suggestion was that we have 3 modes: (1) we manage TS on server => TS
> > gets written as seqNum which has guarantees, no client-supplied TS; (2)
> > client manages TS and supplies it, fail put if no TS; (3) legacy compat
> > mode, but TS is generated in the client library instead of RS, so the
> onus
> > is on the client app/client's server to manage clocks and there's no
> > expectation that HBase will guarantee order.
> >
> >
> >
> > On Thu, Apr 10, 2014 at 5:09 PM, Nick Dimiduk 
> wrote:
> >
> >> Reading through the write path, it seems to me that
> >> RSRpcServices#doBatchOp(RegionActionResult.Builder,HRegion,List
> >> mutations,CellScanner) should be honoring a nonce if present. The reason
> >> being: if a client sends some Puts without specifying a TS, it relies on
> >> the RS to provide one. Should such an operation succeed on the server
> but
> >> the ACK not reach the client, client may resend the operation, silently
> >> inserting more cells than intended. Deletes may well be a more sinister
> >> issue, removing more cells than intended.
> >>
> >> I've not yet written a test to confirm this.
> >>
> >> There was conversation around the implementation of nonces discussing
> >> options for removing the coupling of TS to clock-on-the-wall time.
> Sergey
> >> describes the current situation quite eloquently: "server-generated TS
> >> provide illusion of consistency guarantees which is not there by any
> >> means". A fix for this will likely subtly break the semantics of our
> data
> >> coordinates, and so should be addressed with 1.0, perhaps along side a
> >> revamped client-side API.
> >>
> >> -n
> >>
> >
> >
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>


Re: blockcache 101

2014-04-15 Thread Nick Dimiduk
On Mon, Apr 14, 2014 at 10:12 PM, Todd Lipcon  wrote:

>
> Hmm... in "v2.pdf" here you're looking at different ratios of DB size
> to cache size, but there's also the secondary cache on the system (the
> OS block cache), right?


Yes, this is true.

So when you say only 20GB "memory under management", in fact you're still
> probably getting 100% hit rate on the case where the DB is bigger than RAM,
> right?
>

I can speculate, likely that's true, but I don't know this for certain. At
the moment, the only points of instrumentation in the harness are in the
HBase client. The next steps include pushing down into the RS, DN and
further to then is to the OS itself.

Maybe would be better to have each graph show the different cache
> implementations overlaid, rather than the different ratios overlaid? That
> would better differentiate the scaling behavior of the implementations vs
> each other.


I did experiment with that initially. I found the graphs became dense and
unreadable. I need to spend more time studying Tufti to present all these
data points in a single figure. The data is all included, so please, by all
means have a crack at it. Maybe you'll see something I didn't.

 As you've got it, the results seem somewhat obvious ("as the hit ratio
> gets worse, it gets slower").
>

Yes, that's true. Of interest in this particular experiment was the
relative performance of different caches under identical workloads.


Re: anyone interested in openhft?

2014-04-15 Thread Nick Dimiduk
Yes, now that we have memstore abstracted, this is something worth
experimenting with.

On Tuesday, April 15, 2014, Li Li  wrote:

> http://www.infoq.com/articles/Open-JDK-and-HashMap-Off-Heap
>
> http://openhft.blogspot.com/2014/03/javautilconcurrentconcurrenthashmap-vs.html
> I found this offheap solution and remeber hbase faces gc problem with
> large heap
>


Re: 0.99.0 and 1.0.0 targets in Jira

2014-04-16 Thread Nick Dimiduk
On Wed, Apr 16, 2014 at 9:28 PM, Jonathan Hsieh  wrote:

> I'd prefer if we got rid of one -- maybe mark all as 0.99 and remove 1.0.0.
>  When we branch 1.0, we rename 0.99 to 1.0 and create a 1.1-SNAPSHOT branch
> and make trunk 2.0-SNAPSHOT (or 1.99-SNAPSHOT?).
>

Wouldn't trunk become 1.1-SNAPSHOT?

 On Wed, Apr 16, 2014 at 9:11 PM, lars hofhansl  wrote:
>
> > I see we have both targets in Jira and some issues targeted to 0.99.0 and
> > some to 1.0.0.
> >
> > Which one should we use?
> >
> > -- Lars
> >
>
>
>
> --
> // Jonathan Hsieh (shay)
> // HBase Tech Lead, Software Engineer, Cloudera
> // j...@cloudera.com // @jmhsieh
>


Re: Bringing consensus based strong consistency into HBase

2014-04-18 Thread Nick Dimiduk
For future reference, attachments are filtered out by the list serve. For
working drafts, we've had success using a shared editing service (Google
Docs, specifically). Best to attach later-stage docs to JIRAs.


On Thu, Apr 17, 2014 at 9:29 PM, Mikhail Antonov wrote:

> Guys,
>
> attached is a reworked version of this document which includes:
>
>
>- "starting fresh" approach to consensus operations in HBase
>- benefits it could bring
>- code pointers and list of modifications required to abstract ZK
>
> I'd appreciate review and feedbacks! The document is also attached to
> HBASE-10909 for reference and comments.
>
> Thanks,
>
> Mikhail
>
>
> 2014-04-09 16:33 GMT-07:00 Mikhail Antonov :
>
> Guys,
>>
>> as was advised, I moved the PDF document to umbrella jira, here -
>> https://issues.apache.org/jira/browse/HBASE-10909.
>>
>> Thanks,
>> Mikhail
>>
>>
>> 2014-04-08 8:34 GMT-07:00 Andrew Purtell :
>>
>> Hi Cos,
>>>
>>> Thanks for providing this overview, and more importantly, considering
>>> these
>>> contributions. Reading through the relevant JIRAs, looks to me like HBase
>>> will be in a better position than before to have this abstraction in
>>> place,
>>> provided the changes are fully completed.
>>>
>>> Thanks again for the kind consideration.
>>>
>>>
>>> On Mon, Apr 7, 2014 at 4:15 PM, Konstantin Boudnik 
>>> wrote:
>>>
>>> > Guys,
>>> >
>>> > As some of you might have noticed there is a number of new JIRAs opened
>>> > recently that are aiming on abstracting and separating ZK out of HBase
>>> guts
>>> > and making it an implementation detail, rather than a center of
>>> attention
>>> > for some parts of the HBase.
>>> >
>>> > I would like to send around a short document written by Mikhail Antonov
>>> > that
>>> > is trying to clarify a couple of points about this whole effort.
>>> > People are asking me about this off-line, so I decided to send this
>>> around
>>> > so
>>> > we can have a lively and wider discussion about this initiative and the
>>> > development.
>>> >
>>> > Here's the JIRA and the link to the PDF document.
>>> >   https://issues.apache.org/jira/browse/HBASE-10866
>>> >
>>> >
>>> https://issues.apache.org/jira/secure/attachment/12637957/HBaseConsensus.pdf
>>> >
>>> > The umbrella JIRA for this effort is here
>>> >   https://issues.apache.org/jira/browse/HBASE-10909
>>> >
>>> > --
>>> > Regardsk
>>> >   Cos
>>> >
>>> >
>>>
>>>
>>> --
>>> Best regards,
>>>
>>>- Andy
>>>
>>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>>> (via Tom White)
>>>
>>
>>
>>
>> --
>> Thanks,
>> Michael Antonov
>>
>
>
>
> --
> Thanks,
> Michael Antonov
>


Re: I have one question of HBase Client

2014-04-23 Thread Nick Dimiduk
Array of puts method, HTable#put(List) actually, and
HTable#batch(List, Object[]) are very similar. Both use the
AsyncProcess, an asynchronous means for sending their work to the cluster.
The primary difference is that the former is more respectful about managing
back-pressure from the cluster. The latter allows you to send other
mutation types in bulk, so it's also applicable for append, increment,
delete, in addition to put.

Neither method involves client interaction with the Master. Zookeeper may
be invoked to locate the META region, and the RegionServer hosting META
will be referenced to populate any uncached region locations.


On Fri, Apr 18, 2014 at 9:26 AM, Ted Yu  wrote:

> #3 is distinctive from the other two choices. See javadoc for RowMutations
> :
>
>  * Performs multiple mutations atomically on a single row.
>
>  * Currently {@link Put} and {@link Delete} are supported.
>
> while the other two methods are not limited to a single row.
>
> Cheers
>
>
> On Fri, Apr 18, 2014 at 3:37 AM, Long kyo  wrote:
>
> > Hi all,
> > I know 3 ways to multi puts into HBase table:
> > 1. Array of Puts
> > 2  batch method
> > 3. mutateRow method
> >
> > Question:
> > 1. And the question here is which is best choice for multi puts, (speed
> and
> > perfomance)???
> > 2. And when multi put into hbase table, the client make one or multi
> > request to Hbase master??? (about tcp request,... not socket)
> >
> > Thanks All
> >
> >
> > --
> > Tạ Vũ Long (ター ヴー ロン)
> > Lớp IS1 - Việt Nhật
> > my contact number: 0984028809
> > email : kyo88...@gmail.com
> > yahoo:longdc2001
> >
>


Re: The builds.apache.org grind

2014-04-25 Thread Nick Dimiduk
On Fri, Apr 25, 2014 at 3:13 PM, Andrew Purtell  wrote:

> do we increase the tolerances for builds.apache.org and trade away the
> effectiveness of the test to catch real timing issues?
>

I wonder about this often.


Re: scan filtering for certain rows by column family

2014-05-11 Thread Nick Dimiduk
Does the addFamily method not solve your requirement?

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#addFamily(byte[])


On Tue, May 6, 2014 at 4:29 PM, cnpeyton  wrote:

> I want to do a scan of a table and only have rows that contain certain
> columns families be returned from the scan.  For example, lets say i have 2
> column families a, b, and c.  I want to set a scan up with a filter that
> would only give me rows that have data within column families a and c.  For
> example, if row 2 has only data in column family b, then it should not be
> returned.
>
>
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/scan-filtering-for-certain-rows-by-column-family-tp4058919.html
> Sent from the HBase Developer mailing list archive at Nabble.com.
>


Call for Lightning Talks, Hadoop Summit HBase BoF

2014-05-13 Thread Nick Dimiduk
Hi HBasers!

Subash and I are organizing the HBase Birds of a Feather (BoF) session at
Hadoop Summit San Jose this year. We're looking for 4-5 brave souls willing
to standup for 15 minutes and tell the community what's working for them
and what isn't. Have a story about how this particular feature saved the
day? Great! Really wish something was implemented differently and have a
plan for fixing it? Step up and recruit folks to provide/review patches!

Either way, send me a note off-list and we'll get you queued up.

The event is on Thursday, June 5, 3:30p - 5:00p at the San Jose Convention
Center, room 230C. RSPV at the meetup page [0]. Please note that this event
is NOT exclusive to conference attendees. Come, come, one and all!

See you at the convention center!
Nick & Subash

[0]:
http://www.meetup.com/Hadoop-Summit-Community-San-Jose/events/179081342/


Re: [common type encoding breakout] Re: HBase Hackathon @ Salesforce 05/06/2014 notes

2014-05-13 Thread Nick Dimiduk
Breaking off hackathon thread.

The conversation around HBASE-8089 concluded with two points:
 - HBase should provide support for order-preserving encodings while not
dropping support for the existing encoding formats.
 - HBase is not in the business of schema management; that is a
responsibility left to application developers.

To handle the first point, OrderedBytes is provided. For the supporting the
second, the DataType API is introduced. By introducing this layer above
specific encoding formats, it gives us a hook for plugging in different
implementations and for helper utilities to ship with HBase, such as
HBASE-10091.

Things get fuzzy around complex data types: pojos, compound rowkeys (a
special case of pojo), maps/dicts, and lists/arrays. These types are
composed of other types and have different requirements based on where in
the schema they're used. Again, by falling back on the DataType API, we
give application developers an "out" for doing what makes the most sense
for them.

For compound rowkeys, the Struct class is designed to fill in this gap,
sitting between data encoding and schema expression. It gives the
application implementer, the person managing the schema, enough flexibility
express the key encoding in terms of the component types. These components
are not limited to the simple primitives already defined, but any DataType
implementation. Order preservation is likely important here.

For arrays/lists, there's no implementation yet, but you can see how it
might be done if you have a look at struct. Order preservation may or may
not be important for arrays/list.

The situation for maps/dicts is similar to arrays/lists. The one
complication is the case where you want to map to a column family. How can
these APIs support this thing?

Pojos are a little more complicated. Probably Struct is sufficient for
basic cases, but it doesn't support nice features like versioning -- these
are sacrificed in favor of order preservation. Luckily, there's plenty of
tools out there for this already: Avro, MessagePack, Protobuf, Thrift, &c.
There's no need to reinvent the wheel here. Application developers can
implement the DataType API backed by their management tool of choice. I
created HBASE-11161 and will post a patch shortly.

Specific comments about the Hackathon notes inline.

Thanks,
Nick

On Mon, May 12, 2014 at 5:01 PM, Jonathan Hsieh  wrote:

>
> Here's where where I believe there is a agreement:
> * basic memcmp numeric encodings for ints, floats/doubles
>

This is already provided by OrderedBytes and the DataType implementations
Ordered{Float32,Float64,Int8,Int16,Int32,Int64,Numeric}.

* fixed scale decimal type
>

Provided by OrderedNumeric

* evolvability is highly desirable and thus tagged types of structs is
> desirable.  seemed like agreement for protobuf (which meets criteria)
> encoding for complex data types (records, arrays, lists with records, maps
> in a single cell)
>

Why not use protobuf directly instead of reimplementing a slight variation
of their format?

* no protobuf complex type encodings in rowkey (rowkey is like a struct but
> memcmp is critical).
>

Agreed. Struct is provided for this purpose.

* memcmp encodings for primitives in cells desired for phoenix (2ndary
> indices?)
>

This sounds like a Phoenix-specific decision.

* must support nulls in compound keys.
>

Struct offers this when the component types support it.

* this might be a separate module in hbase
>

This was discussed when HBASE-8089 was started and the consensus was to
place it into hbase-common. This can be reconsidered as necessary.

Here's where the main discussion points that need follow up (or where I'm
> not sure there was agreement):
> * for compound key encoding with nulls, do we need to distinguish null from
> ""? (phoenix emulates oracle, where they are same)
>

Null and "" are distinct in all JVM languages I'm aware of. We should not
preclude the possibility.

* compound key encoding of string/byte[]'s (how to handle \0)
>

OrderedBytes implements a bit-shifting strategy for this.
{FixedLength,Terminated}Wrapper are provided to add flexibility. Ryan has
suggested a variation of run-length encoding as another alternative,
something we could add is there's sufficient need.

* do we include 1 byte and 2 byte ints?
>

Following the initial commit of HBASE-8201, these were requested HBASE-9369.

* how to handle encodings of sql compound type like date (are they complex
> or primitives?)
> ** updated suggestion for date encodings.
>

Dates were not mentioned in previous discussions; would be good to have!

* Nick brings up some issues about the philosophy around
> o.a.h.h.types.DataType.. As I understand it, this datatype api has
> *extensibility* as the goal of being one api that could wrap many alternate
> encodings of data for hbase.


The above date question is a perfece example of why I think it's important
that we have the DataType interface. Having the interface means an
application can im

Re: custom filter which extends Filter directly

2014-05-15 Thread Nick Dimiduk
+hbase-user


On Tue, May 13, 2014 at 7:57 PM, Ted Yu  wrote:

> To be a bit more specific (Filter is an interface in 0.94):
> If you use 0.96+ releases and your filter extends Filter directly, I would
> be curious to know your use case.
>
> Thanks
>
>
> On Tue, May 6, 2014 at 11:25 AM, Ted Yu  wrote:
>
> > Hi,
> > Filter is an abstract class.
> >
> > If your filter extends Filter directly, I would be curious to know your
> > use case.
> >
> > Thanks
> >
>


Re: Call for Lightning Talks, Hadoop Summit HBase BoF

2014-05-15 Thread Nick Dimiduk
Just to be clear, this is not a call for vendor pitches. This is a venue
for HBase users, operators, and developers to intermingle, share stories,
and storm new ideas.


On Tue, May 13, 2014 at 11:40 AM, Nick Dimiduk  wrote:

> Hi HBasers!
>
> Subash and I are organizing the HBase Birds of a Feather (BoF) session at
> Hadoop Summit San Jose this year. We're looking for 4-5 brave souls willing
> to standup for 15 minutes and tell the community what's working for them
> and what isn't. Have a story about how this particular feature saved the
> day? Great! Really wish something was implemented differently and have a
> plan for fixing it? Step up and recruit folks to provide/review patches!
>
> Either way, send me a note off-list and we'll get you queued up.
>
> The event is on Thursday, June 5, 3:30p - 5:00p at the San Jose Convention
> Center, room 230C. RSPV at the meetup page [0]. Please note that this event
> is NOT exclusive to conference attendees. Come, come, one and all!
>
> See you at the convention center!
> Nick & Subash
>
> [0]:
> http://www.meetup.com/Hadoop-Summit-Community-San-Jose/events/179081342/
>


Re: [common type encoding breakout] Re: HBase Hackathon @ Salesforce 05/06/2014 notes

2014-05-15 Thread Nick Dimiduk
On Tue, May 13, 2014 at 3:35 PM, Ryan Blue  wrote:


> I think there's a little confusion in what we are trying to accomplish.
> What I want to do is to write a minimal specification for how to store a
> set of types. I'm not trying to leave much flexibility, what I want is
> clarity and simplicity.
>

This is admirable and was my initial goal as well. The trouble is, you
cannot please everyone, current users and new. So, we decided it was better
to provide a pluggable framework for extension + some basic implementations
than to implement a closed system.

This is similar to OrderedBytes work, but a subset of it. A good example is
> that while it's possible to use different encodings (avro, protobuf,
> thrift, ...) it isn't practical for an application to support all of those
> encodings. So for interoperability between Kite, Phoenix, and others, I
> want a set of requirements that is as small as possible.
>

Minimal is good. The surface area of o.a.h.h.types is as large as it is
because there was always "just one more" type to support or encoding to
provide.

To make the requirements small, I used off-the-shelf protobuf [1] plus a
> small set of memcmp encodings: ints, floats, and binary. That way, we don't
> have to talk about how to make a memcmp Date in bytes, for example. A Date
> is an int, which we know how to encode, and we can agree separately on how
> to a Date is represented (e.g., Julian vs unix epoch). [2] The same applies
> to binary, where the encoding handles sorting and nulls, but not charsets.
>

I think you should focus on the primitives you want to support. The
compound type stuff (ie, "rowkey encodings") is a can of worms because you
need to support existing users, new users, novice users, and advanced
users. Hence the interop between the DataType interface and the Struct
classes. These work together to support all of these use-cases with the
same basic code. For example, the protobuf encoding of postion|wire-type +
encoded value is easily implemented using Struct.

I firmly believe that we cannot dictate rowkey composition. Applications,
however, are free to implement their own. By using the common DataType
interface, they can all interoperate.

This is the largest reason why I didn't include OrderedBytes directly in
> the spec. For example, OB includes a varint that I don't think is needed. I
> don't object to its inclusion in OB, but I think it isn't a necessary
> requirement for implementing this spec.
>

Again, the surface area is as it is because of community consensus during
the first phase of implementation. That consensus disagrees with you.

I think there are 3 things to clear up:
> 1. What types from OB are not included, and why?
> 2. Why not use OB-style structs?
> 3. Why choose protobuf for complex records?
>
> Does that sound like a reasonable direction to head with this discussion?
>

Yes, sounds great!

As far as the DataType API, I think that works great with what I'm trying
> to do. We'd build a DataType implementation for the encoding and the API
> will applications handle the underlying encoding. And other encoding
> strategies can be swapped in as well, if we want to address shortcomings in
> this one, or have another for a different use case.
>

I'm quite pleased to hear that. Applications like Kite, Phoenix, Kiji are
the target audience of the DataType API.

Thank you for picking back up this baton. It's sat for too long.

-n

On 05/13/2014 02:33 PM, Nick Dimiduk wrote:
>
>> Breaking off hackathon thread.
>>
>> The conversation around HBASE-8089 concluded with two points:
>>   - HBase should provide support for order-preserving encodings while
>> not dropping support for the existing encoding formats.
>>   - HBase is not in the business of schema management; that is a
>> responsibility left to application developers.
>>
>> To handle the first point, OrderedBytes is provided. For the supporting
>> the second, the DataType API is introduced. By introducing this layer
>> above specific encoding formats, it gives us a hook for plugging in
>> different implementations and for helper utilities to ship with HBase,
>> such as HBASE-10091.
>>
>> Things get fuzzy around complex data types: pojos, compound rowkeys (a
>> special case of pojo), maps/dicts, and lists/arrays. These types are
>> composed of other types and have different requirements based on where
>> in the schema they're used. Again, by falling back on the DataType API,
>> we give application developers an "out" for doing what makes the most
>> sense for them.
>>
>> For compound rowkeys, the Struct class is designed to fill in this gap,
>> sitting between data encoding an

Re: [common type encoding breakout] Re: HBase Hackathon @ Salesforce 05/06/2014 notes

2014-05-19 Thread Nick Dimiduk
On Thu, May 15, 2014 at 9:32 AM, James Taylor wrote:

> @Nick - I like the abstraction of the DataType, but that doesn't solve the
> problem for non Java usage.


That's true. It's very much a Java construct. Likewise, Struct only codes
for semantics; there's no encoding defined there. For correct
multi-language support, we'll need to define these semantics the same way
we do the encoding details so that implementations can reproduce them
faithfully.

I'm also a bit worried that it might become a bottleneck for implementors
> of the serialization spec as there are many different platform specific
> operations that will likely be done on the row key. We can try to get
> everything necessary in the DataType interface, but I suspect that
> implementors will need to go under-the-covers at times (rather than waiting
> for another release of the module that defines the DataType interface) -
> might become a bottleneck.
>

Time will tell. DataType is just an interface, after all. If there are
things it's missing (as there surely are, for Phoenix...), it'll need to be
extended locally until these features can be pushed down into HBase. HBase
release managers have been faithful to the monthly release train, so I
think in practice dependent projects won't have to wait long. I'm content
to take this on a case-by-case basis and watch for a trend. Do you have an
alternative idea?

On Wed, May 14, 2014 at 5:17 PM, Nick Dimiduk  wrote:
>
> > On Tue, May 13, 2014 at 3:35 PM, Ryan Blue  wrote:
> >
> >
> > > I think there's a little confusion in what we are trying to accomplish.
> > > What I want to do is to write a minimal specification for how to store
> a
> > > set of types. I'm not trying to leave much flexibility, what I want is
> > > clarity and simplicity.
> > >
> >
> > This is admirable and was my initial goal as well. The trouble is, you
> > cannot please everyone, current users and new. So, we decided it was
> better
> > to provide a pluggable framework for extension + some basic
> implementations
> > than to implement a closed system.
> >
> > This is similar to OrderedBytes work, but a subset of it. A good example
> is
> > > that while it's possible to use different encodings (avro, protobuf,
> > > thrift, ...) it isn't practical for an application to support all of
> > those
> > > encodings. So for interoperability between Kite, Phoenix, and others, I
> > > want a set of requirements that is as small as possible.
> > >
> >
> > Minimal is good. The surface area of o.a.h.h.types is as large as it is
> > because there was always "just one more" type to support or encoding to
> > provide.
> >
> > To make the requirements small, I used off-the-shelf protobuf [1] plus a
> > > small set of memcmp encodings: ints, floats, and binary. That way, we
> > don't
> > > have to talk about how to make a memcmp Date in bytes, for example. A
> > Date
> > > is an int, which we know how to encode, and we can agree separately on
> > how
> > > to a Date is represented (e.g., Julian vs unix epoch). [2] The same
> > applies
> > > to binary, where the encoding handles sorting and nulls, but not
> > charsets.
> > >
> >
> > I think you should focus on the primitives you want to support. The
> > compound type stuff (ie, "rowkey encodings") is a can of worms because
> you
> > need to support existing users, new users, novice users, and advanced
> > users. Hence the interop between the DataType interface and the Struct
> > classes. These work together to support all of these use-cases with the
> > same basic code. For example, the protobuf encoding of postion|wire-type
> +
> > encoded value is easily implemented using Struct.
> >
> > I firmly believe that we cannot dictate rowkey composition. Applications,
> > however, are free to implement their own. By using the common DataType
> > interface, they can all interoperate.
> >
> > This is the largest reason why I didn't include OrderedBytes directly in
> > > the spec. For example, OB includes a varint that I don't think is
> > needed. I
> > > don't object to its inclusion in OB, but I think it isn't a necessary
> > > requirement for implementing this spec.
> > >
> >
> > Again, the surface area is as it is because of community consensus during
> > the first phase of implementation. That consensus disagrees with you.
> >
> > I think there are 3 things to clear up:
> > > 1. What types from OB are not in

Re: VOTE: Move to GIT

2014-05-20 Thread Nick Dimiduk
+1 by me as well.

Just out of curiosity, is there a technical argument for why we should
stick with SVN? (maybe that was discussed on the other thread...)

On Tuesday, May 20, 2014, Stack  wrote:

> On Mon, May 19, 2014 at 2:41 PM, Enis Söztutar >
> wrote:
> >
> > I though the vote already passed, but I think it was the discussion
> thread.
> >
>
> It was DISCUSSION and it 'passed'. Talat was busy and just showed up again.
> He needs a vote to show INFRA.
> St.Ack
>


Re: [common type encoding breakout] Re: HBase Hackathon @ Salesforce 05/06/2014 notes

2014-05-20 Thread Nick Dimiduk
That's correct Andy. We're locking down the "default" primitive type
implementations going forward, while maintaining a flexible API such that
we can support existing users who want to migrate to the applicable new
features without rewriting existing data. Obviously some of those features
will depend on the new encoding semantics, but I think we can offer a net
improvement even for existing applications.


On Mon, May 19, 2014 at 6:31 AM, Andrew Purtell wrote:

> So if I can summarize this thread so far, we are going to try and hammer
> out a types encoding spec agreeable to HBase, Phoenix, and Kite alike? As
> opposed to select a particular implementation today as both spec and
> reference implementation. Is that correct?
>
> If so, that sounds like a promising direction. The HBase types library has
> the flexibility, if I understand Nick correctly, to accommodate whatever is
> agreed upon and we could then provide a reference implementation as a
> service for HBase users (or anyone) but there would be no strings attached,
> multiple implementations of the spec would interoperate by definition.
>
>
> > On May 19, 2014, at 3:20 AM, Nick Dimiduk  wrote:
> >
> > On Thu, May 15, 2014 at 9:32 AM, James Taylor  >wrote:
> >
> >> @Nick - I like the abstraction of the DataType, but that doesn't solve
> the
> >> problem for non Java usage.
> >
> >
> > That's true. It's very much a Java construct. Likewise, Struct only codes
> > for semantics; there's no encoding defined there. For correct
> > multi-language support, we'll need to define these semantics the same way
> > we do the encoding details so that implementations can reproduce them
> > faithfully.
> >
> > I'm also a bit worried that it might become a bottleneck for implementors
> >> of the serialization spec as there are many different platform specific
> >> operations that will likely be done on the row key. We can try to get
> >> everything necessary in the DataType interface, but I suspect that
> >> implementors will need to go under-the-covers at times (rather than
> waiting
> >> for another release of the module that defines the DataType interface) -
> >> might become a bottleneck.
> >
> > Time will tell. DataType is just an interface, after all. If there are
> > things it's missing (as there surely are, for Phoenix...), it'll need to
> be
> > extended locally until these features can be pushed down into HBase.
> HBase
> > release managers have been faithful to the monthly release train, so I
> > think in practice dependent projects won't have to wait long. I'm content
> > to take this on a case-by-case basis and watch for a trend. Do you have
> an
> > alternative idea?
> >
> >> On Wed, May 14, 2014 at 5:17 PM, Nick Dimiduk 
> wrote:
> >>
> >>> On Tue, May 13, 2014 at 3:35 PM, Ryan Blue  wrote:
> >>>
> >>>
> >>>> I think there's a little confusion in what we are trying to
> accomplish.
> >>>> What I want to do is to write a minimal specification for how to store
> >> a
> >>>> set of types. I'm not trying to leave much flexibility, what I want is
> >>>> clarity and simplicity.
> >>>
> >>> This is admirable and was my initial goal as well. The trouble is, you
> >>> cannot please everyone, current users and new. So, we decided it was
> >> better
> >>> to provide a pluggable framework for extension + some basic
> >> implementations
> >>> than to implement a closed system.
> >>>
> >>> This is similar to OrderedBytes work, but a subset of it. A good
> example
> >> is
> >>>> that while it's possible to use different encodings (avro, protobuf,
> >>>> thrift, ...) it isn't practical for an application to support all of
> >>> those
> >>>> encodings. So for interoperability between Kite, Phoenix, and others,
> I
> >>>> want a set of requirements that is as small as possible.
> >>>
> >>> Minimal is good. The surface area of o.a.h.h.types is as large as it is
> >>> because there was always "just one more" type to support or encoding to
> >>> provide.
> >>>
> >>> To make the requirements small, I used off-the-shelf protobuf [1] plus
> a
> >>>> small set of memcmp encodings: ints, floats, and binary. That way, we
> >>> don't
> >>>> have to talk about how to make a memcmp Date in byte

Re: ANNOUNCEMENT: Git Migration In Progress (WAS => Re: Git Migration)

2014-05-21 Thread Nick Dimiduk
Is there a resolution published somewhere? Do we have any ground rules for
making effective use of git? For instance, if we want to maintain linear
histories, we should enforce fast-forward only pushes. Do we allow forced
pushes to published branches? What about policies for feature branches --
should these preserve history? Will we squish the feature into a single
commit? &c.

Git is very nice, but also gives us more rope than we had before.

Perhaps this has some useful formulae:
http://git-scm.com/book/en/Customizing-Git-An-Example-Git-Enforced-Policy


On Tue, May 20, 2014 at 10:10 PM, Talat Uyarer  wrote:

> Good news :)
> 21 May 2014 08:05 tarihinde "Stack"  yazdı:
>
> > SVN has been flipped read-only.  The migration to git has started.  See
> > https://issues.apache.org/jira/browse/INFRA-7768
> >
> > St.Ack
> >
> >
> > On Mon, May 19, 2014 at 3:56 PM, Stack  wrote:
> >
> > > On Mon, May 19, 2014 at 1:48 PM, Talat Uyarer 
> wrote:
> > >
> > >> Hi All,
> > >>
> > >> I created an issue for our git migrating. [0] We can follow our
> > >> migration status. Fyi
> > >>
> > >
> > > Thank you Talat,
> > > St.Ack
> > >
> >
>


Re: [DISCUSSION] Update on HBASE-10070 / Merge into trunk

2014-05-22 Thread Nick Dimiduk
I'm in favor of integration into trunk sooner than later (it's overdue,
IMHO).


On Wed, May 21, 2014 at 6:42 PM, Devaraj Das  wrote:

> +1 for merge to trunk now.
>
> On Wed, May 21, 2014 at 5:08 PM, Enis Söztutar  wrote:
> > Hi,
> >
> > We would like to give an update on the status of HBASE-10070 work, and
> open
> > up discussion for how we can do further development.
> >
> > We seem to be at a point where we have the core functionality of the
> > region replica, as described in HBASE-10070 working. As pointed out
> > under the section "Development Phases" in the design doc posted on the
> > jira HBASE-10070, this work was divided into two broad phases. The first
> > phase introduces region replicas concept, the new consistency model, and
> > corresponding RPC implementations. All of the issues for Phase 1 can be
> > found under [3]. Phase 2 is still in the works, and contains the proposed
> > changes listed under [4].
> >
> > With all the issues committed in HBASE-10070 branch in svn, we think that
> > the "phase-1" is complete. The user documentation on HBASE-10513 gives an
> > accurate picture of what has been done in phase-1 and what the impact of
> > using this feature is, APIs etc. We have added
> > a couple of IT tests as part of this work and we have tested the work
> > we did in "phase-1" of the project quite extensively in Hortonworks'
> > infrastructure.
> >
> > In summary, with the code in branch, you can create tables with region
> > replicas, do gets / multi gets and scans using TIMELINE consistency with
> > high availability. Region replicas periodically scan the files of the
> > primary and pick up flushed / committed files. The RPC paths /
> assignment,
> > balancing etc are pretty stable. However some more performance analysis
> and
> > tuning is needed. More information can be found in [1] and [2].
> >
> >
> > As a reminder, the development has been happening in the branch -
> > hbase-10070 (https://github.com/apache/hbase/tree/hbase-10070). We have
> > been pretty diligent about more than one committer's +1 on the branch
> > commits and for almost all the subtasks in HBASE-10070 there is more than
> > one +1 except for test fix issues or more trivial issues, where there
> maybe
> >  only one +1.  Enis/Nicolas/Sergey/Devaraj/Nick are the main contributors
> > of code in the phase-1 and many patches have been reviewed already by
> > people outside
> > this group (mainly Stack, Jimmy)
> >
> > For Phase 2, we think that we can deliver on the proposed design
> > incrementally over the next couple of months. However, we think that it
> > might be ok to merge the changes from phase 1 first, then do a
> > commit-as-you-go approach for remaining items. Therefore, we would like
> to
> > propose  to merge the branch to trunk, and continue the work over there.
> > This might also result in more reviews.
> >
> > Alternatively, we can continue the work in the branch, and do the merge
> at
> > the end of Phase 2, but that will make the review process a bit more
> > tricky, which is why we would like the merge sooner.
> >
> > What do you think? Comments, concerns?
> >
> > [1]
> >
> https://issues.apache.org/jira/secure/attachment/12644237/hbase-10513_v1.patch
> > [2]
> >
> http://www.slideshare.net/enissoz/hbase-high-availability-for-reads-with-time
> > [3] https://issues.apache.org/jira/browse/HBASE-10070
> > [4] https://issues.apache.org/jira/browse/HBASE-11183
> >
> > Thanks,
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>


Re: ANNOUCEMENT: Git Migration completed

2014-05-22 Thread Nick Dimiduk
What is the relationship between
https://git-wip-us.apache.org/repos/asf/hbase.git and git://
git.apache.org/hbase.git ? The latter is a read-only tracking repo?


On Thu, May 22, 2014 at 9:50 AM, Andrew Purtell  wrote:

> The migration from SVN to Git has completed and folks have begun committing
> to the new repository already.
>
> See https://git-wip-us.apache.org/repos/asf?p=hbase.git
>
> Clone from https://git-wip-us.apache.org/repos/asf/hbase.git
>
>
> --
> Best regards,
>
>- Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>


Re: Please fix your git name/address on your commits.

2014-06-10 Thread Nick Dimiduk
Looking at history, it appears we have our first unintended developer merge
commit pushed as well.


On Tue, Jun 10, 2014 at 10:36 AM, Nicolas Liochon  wrote:

> As Matteo.
>
>
> On Tue, Jun 10, 2014 at 7:18 PM, Matteo Bertozzi 
> wrote:
>
> > I think that @unknown is a result from the svn conversion (at least
> looking
> > at my commits)
> >
> > Matteo
> >
> >
> >
> > On Tue, Jun 10, 2014 at 6:09 PM, Jonathan Hsieh 
> wrote:
> >
> > > Now that we've a few weeks with the git repo I've noticed that some of
> > use
> > > are showing up with funny names like name@unknown or @*MacBook*.
> > >  Can
> > > you all fix your emails before you next commits (my assumption is
> either
> > > apache.org or company email) ?
> > >
> > > To set just in that project, see the first answer here:
> > >
> > >
> >
> http://stackoverflow.com/questions/6116548/how-to-tell-git-to-use-the-correct-identity-name-and-email-for-a-given-project
> > >
> > > Here's a list of authors names from the past 500 commits.
> > >
> > >  Andrew Kyle Purtell 
> > >  Andrew Purtell 
> > >  anoopsamjohn 
> > >  anoopsjohn 
> > >  Devaraj Das 
> > >  eclark 
> > >  Enis Soztutar 
> > >  fenghh 
> > >  Gary Helmling 
> > >  Jean-Daniel Cryans 
> > >  Jeffrey Zhong 
> > >  jeffreyz 
> > >  Jesse Yates 
> > >  Jimmy Xiang 
> > >  Jonathan Hsieh 
> > >  Jonathan M Hsieh 
> > >  jxiang 
> > >  larsh 
> > >  Lars Hofhansl 
> > >  liangxie 
> > >  Matteo Bertozzi 
> > >  mbertozzi 
> > >  Michael Stack 
> > >  Michael Stack 
> > >  ndimiduk 
> > >  Nick Dimiduk 
> > >  Nicolas Liochon 
> > >  nkeywal 
> > >  rajeshbabu 
> > >  Ramkrishna 
> > >  ramkrishna 
> > >  sershe 
> > >  Ted Yu 
> > >  tedyu 
> > >  Zhihong Yu 
> > >  zjusch 
> > >
> > >
> > > --
> > > // Jonathan Hsieh (shay)
> > > // HBase Tech Lead, Software Engineer, Cloudera
> > > // j...@cloudera.com // @jmhsieh
> > >
> >
>


Fork protobuf?

2014-06-11 Thread Nick Dimiduk
FYI.

There's a fairly serious thread about forking protobuf happening over
on HBASE-8. Should be socialized a little wider, I think.


Metrics units?

2014-06-13 Thread Nick Dimiduk
Heya,

Do we have documentation someplace describing the units for various metics?
For instance, it's not always clear if a unit is bytes or milliseconds.
Would be nice if the metric names were self-identifying, but that will
break compatibility with anyone who's monitoring specific metrics by name,
so maybe just an update to the book?

-n


Re: jdk 1.7 & trunk

2014-06-25 Thread Nick Dimiduk
I think it's time to move master/1.0 to 1.7, though, where is Hadoop on
this? We should support 1.6 if we plan to continue to support hadoop
versions that support 1.6.


On Wed, Jun 25, 2014 at 9:15 AM, Nicolas Liochon  wrote:

> Hi all,
>
> HBASE-11297 just broke the build because it uses a ConcurrentLinkedDeque.
>
> Should we be 1.7 only for trunk / 1.0?
> This would mean using the 1.7 features.
>
> What about .98?
>
> We would need to update our precommit env, it still builds with 1.6
> today...
>
> Shall we start a vote?
>
> Nicolas
>


Re: [NOTICE] Branching for 1.0

2014-07-01 Thread Nick Dimiduk
On Tue, Jul 1, 2014 at 4:01 PM, Enis Söztutar  wrote:

> One other thing we can do is that we can commit the patch to 0.98 if you
> +1, do the RC, but hold on for committing to 1.0. During the RC vote
> timeframe, we can then reach a consensus for whether the patch should go
> into both branches.
>

It would be a shame to loose track of patches because of this additional
administrative step happening asynchronously from initial push of the
commit.

On Tue, Jul 1, 2014 at 3:34 PM, Andrew Purtell  wrote:
>
> > I agree just about everything related to HBASE-10856 is something that
> > merits discussion and consensus.
> >
> > > My main goal for branch-1 is to limit the exposure for unrelated
> changes
> > in the branch for a more stable release
> >
> > This is a goal shared by 0.98 so that's no issue at all.
> >
> > What we should sort out is coordinating RTC on multiple active branches.
> > For example, it's not possible for me to commit to rolling a 0.98 RC on a
> > particular day if we have a blocker that needs to go through 1.0 first,
> > since it is not clear for any given commit when or if it will be acked
> for
> > 1.0.
> >
> >
> > On Tue, Jul 1, 2014 at 3:29 PM, Enis Söztutar  wrote:
> >
> > > Agreed that for every feature including security, we should be careful
> to
> > > not create a gap in terms of support (release x supporting, release x+1
> > not
> > > supporting, release x+2 supporting etc).
> > >
> > > My main goal for branch-1 is to limit the exposure for unrelated
> changes
> > in
> > > the branch for a more stable release. If we think that we need to
> > > fix/improve some things for 1.0 and 0.98.x, it will be ok to commit.
> Some
> > > of the items linked under
> > > https://issues.apache.org/jira/browse/HBASE-10856
> > > imply big changes, but it would be ok to commit those to have a clear
> > > story.
> > >
> > > I think we can decide on a per-issue/feature basis.
> > > Enis
> > >
> > >
> > > On Tue, Jul 1, 2014 at 3:16 PM, Andrew Purtell 
> > > wrote:
> > >
> > > > Now that I think about it more, actually every commit, since I don't
> > > think
> > > > we want a situation where something goes into master and 0.98, but
> not
> > > 1.0.
> > > > We should discuss how to handle this.
> > > >
> > > >
> > > > On Tue, Jul 1, 2014 at 3:10 PM, Andrew Purtell 
> > > > wrote:
> > > >
> > > > > I'm curious what will be the policy for security commits? I plan to
> > > take
> > > > > all security changes into 0.98. If we have commits to master and
> 0.98
> > > > that
> > > > > will result in a serious feature / functionality discontinuity.
> > > > >
> > > > >
> > > > > On Mon, Jun 30, 2014 at 8:56 PM, Enis Söztutar  >
> > > > wrote:
> > > > >
> > > > >> I've pushed the branch, named branch-1:
> > > > >>
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=hbase.git;a=shortlog;h=refs/heads/branch-1
> > > > >>
> > > > >> Please do not commit new features to branch-1 without pinging the
> RM
> > > > (for
> > > > >> 1.0 it is me). Bug fixes, and trivial commits can always go in.
> > > > >>
> > > > >> That branch still has 0.99.0-SNAPSHOT as the version number, since
> > > next
> > > > >> expected release from that is 0.99.0. Jenkins build for this
> branch
> > is
> > > > >> setup at https://builds.apache.org/view/All/job/HBase-1.0/. It
> > builds
> > > > >> with
> > > > >> latest jdk7. I'll try to stabilize the unit tests for the first
> RC.
> > > > >>
> > > > >> I've changed the master version as well. It now builds with
> > > > >> 2.0.0-SNAPSHOT.
> > > > >> Exciting!
> > > > >>
> > > > >> Enis
> > > > >>
> > >
> >
> > --
> > Best regards,
> >
> >- Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>


Re: DISCUSSION: 1.0.0

2014-07-07 Thread Nick Dimiduk
Would you mind including the JIRA numbers along with the request?

Thanks,
Nick


On Mon, Jul 7, 2014 at 9:52 AM, Aditya  wrote:

> Do we want to have the C APIs part of 1.0.0 release. I had posted few
> patches and a set of review request sometime last week.
>
>
> On Fri, Jul 4, 2014 at 1:21 AM, Enis Söztutar  wrote:
>
> > On Thu, Jul 3, 2014 at 4:41 PM, Mikhail Antonov 
> > wrote:
> >
> > > Moved ZK watcher & listener subtask out of scope HBASE-10909. Enis -
> with
> > > that, I guess HBASE-10909 can be marked in branch-1?
> > >
> >
> > Sounds good.
> >
> >
> > >
> > > HBASE-11464 - this is the jira where I'll capture tasks to abstract
> hbase
> > > client from ZK (mostly it would be post-1.0 work).
> > >
> >
> > Not sure whether we can make it fully backwards compatible with 1.0
> > clients. I guess we will see when the patches are done.
> >
> >
> > >
> > > Thanks,
> > > Mikhail
> > >
> > >
> > > 2014-07-03 12:52 GMT-07:00 Stack :
> > >
> > > > On Thu, Jul 3, 2014 at 12:25 PM, Mikhail Antonov <
> olorinb...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Guys,
> > > > >
> > > > > getting back to ZK abstraction work w.r.t. release 1.0 and
> > thereafter,
> > > > some
> > > > > status update. So as we're getting closer to complete HBASE-10909,
> it
> > > > looks
> > > > > like the steps may be like this:
> > > > >
> > > > >  - there are 2 subtasks out there not closed yet, one of which is
> > about
> > > > log
> > > > > splitting (and Sergey S has submitted a patch for review), another
> is
> > > > > abstraction of ZK watcher (this is what I've been working on in the
> > > > > background; but after sketching the patch it seems like without
> being
> > > > able
> > > > > to modify the control flows and some changes in the module
> structure,
> > > > it'd
> > > > > be a lot of scaffolding code not really simplifying current code).
> So
> > > I'd
> > > > > propose to descope abstraction of ZK watcher jira (HBASE-11073),
> > > namely:
> > > > > convert it to top-level JIRA and continue to work on it separately;
> > > > rename
> > > > > HBASE-10909 to "ZK abstraction: phase 1", and mark it as closed as
> > soon
> > > > as
> > > > > log splitting jira is completed. This way HBASE-10909 fits to
> > branch-1.
> > > > >
> > > >
> > > > Sounds good to me.
> > > >
> > > >
> > > > >  - secondly, in the discussion to the CatalogTracker patch, we
> > started
> > > > > talking about modifying client to not know about ZK, but rather
> keep
> > > the
> > > > > location of current masters and talk to them using RPC calls. This
> > work
> > > > can
> > > > > not go into branch-1, as it involves invasive changes in client
> > > including
> > > > > new RPC. As I understand the branching schema now, those changes
> can
> > go
> > > > to
> > > > > master branch, we just don't merge them to branch-1, and depending
> on
> > > > their
> > > > > completeness we can pull them to 1.1 release or so.
> > > > >
> > > >
> > > > You have it right Mikhail.
> > > >
> > > > St.Ack
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks,
> > > Michael Antonov
> > >
> >
>


Re: DISCUSSION: 1.0.0

2014-07-09 Thread Nick Dimiduk
This ticket has only open subtasks, ie nothing in 'patch available'. I
assume you mean HBASE-10168. We'll see about getting you some reviews, but
you should also go about formatting the patch for buildbot. Also, since
your 3 reviews are individually 100+k, you should consider breaking them
into three separate tickets.

my 2¢
-n


On Mon, Jul 7, 2014 at 12:01 PM, Aditya  wrote:

> Sorry about that.
>
> Here is the umbrella JIRA https://issues.apache.org/jira/browse/HBASE-1015
>
>
> On Mon, Jul 7, 2014 at 10:05 AM, Nick Dimiduk  wrote:
>
>> Would you mind including the JIRA numbers along with the request?
>>
>> Thanks,
>> Nick
>>
>>
>> On Mon, Jul 7, 2014 at 9:52 AM, Aditya  wrote:
>>
>>> Do we want to have the C APIs part of 1.0.0 release. I had posted few
>>> patches and a set of review request sometime last week.
>>>
>>>
>>> On Fri, Jul 4, 2014 at 1:21 AM, Enis Söztutar 
>>> wrote:
>>>
>>> > On Thu, Jul 3, 2014 at 4:41 PM, Mikhail Antonov 
>>> > wrote:
>>> >
>>> > > Moved ZK watcher & listener subtask out of scope HBASE-10909. Enis -
>>> with
>>> > > that, I guess HBASE-10909 can be marked in branch-1?
>>> > >
>>> >
>>> > Sounds good.
>>> >
>>> >
>>> > >
>>> > > HBASE-11464 - this is the jira where I'll capture tasks to abstract
>>> hbase
>>> > > client from ZK (mostly it would be post-1.0 work).
>>> > >
>>> >
>>> > Not sure whether we can make it fully backwards compatible with 1.0
>>> > clients. I guess we will see when the patches are done.
>>> >
>>> >
>>> > >
>>> > > Thanks,
>>> > > Mikhail
>>> > >
>>> > >
>>> > > 2014-07-03 12:52 GMT-07:00 Stack :
>>> > >
>>> > > > On Thu, Jul 3, 2014 at 12:25 PM, Mikhail Antonov <
>>> olorinb...@gmail.com
>>> > >
>>> > > > wrote:
>>> > > >
>>> > > > > Guys,
>>> > > > >
>>> > > > > getting back to ZK abstraction work w.r.t. release 1.0 and
>>> > thereafter,
>>> > > > some
>>> > > > > status update. So as we're getting closer to complete
>>> HBASE-10909, it
>>> > > > looks
>>> > > > > like the steps may be like this:
>>> > > > >
>>> > > > >  - there are 2 subtasks out there not closed yet, one of which is
>>> > about
>>> > > > log
>>> > > > > splitting (and Sergey S has submitted a patch for review),
>>> another is
>>> > > > > abstraction of ZK watcher (this is what I've been working on in
>>> the
>>> > > > > background; but after sketching the patch it seems like without
>>> being
>>> > > > able
>>> > > > > to modify the control flows and some changes in the module
>>> structure,
>>> > > > it'd
>>> > > > > be a lot of scaffolding code not really simplifying current
>>> code). So
>>> > > I'd
>>> > > > > propose to descope abstraction of ZK watcher jira (HBASE-11073),
>>> > > namely:
>>> > > > > convert it to top-level JIRA and continue to work on it
>>> separately;
>>> > > > rename
>>> > > > > HBASE-10909 to "ZK abstraction: phase 1", and mark it as closed
>>> as
>>> > soon
>>> > > > as
>>> > > > > log splitting jira is completed. This way HBASE-10909 fits to
>>> > branch-1.
>>> > > > >
>>> > > >
>>> > > > Sounds good to me.
>>> > > >
>>> > > >
>>> > > > >  - secondly, in the discussion to the CatalogTracker patch, we
>>> > started
>>> > > > > talking about modifying client to not know about ZK, but rather
>>> keep
>>> > > the
>>> > > > > location of current masters and talk to them using RPC calls.
>>> This
>>> > work
>>> > > > can
>>> > > > > not go into branch-1, as it involves invasive changes in client
>>> > > including
>>> > > > > new RPC. As I understand the branching schema now, those changes
>>> can
>>> > go
>>> > > > to
>>> > > > > master branch, we just don't merge them to branch-1, and
>>> depending on
>>> > > > their
>>> > > > > completeness we can pull them to 1.1 release or so.
>>> > > > >
>>> > > >
>>> > > > You have it right Mikhail.
>>> > > >
>>> > > > St.Ack
>>> > > >
>>> > >
>>> > >
>>> > >
>>> > > --
>>> > > Thanks,
>>> > > Michael Antonov
>>> > >
>>> >
>>>
>>
>>
>


Hadoop-Azure storage

2014-07-11 Thread Nick Dimiduk
FYI. It looks like the Microsoft folks want to make Azue the definitive
cloud on which to run HBase.

-n

https://issues.apache.org/jira/browse/HADOOP-10809


Re: Hadoop-Azure storage

2014-07-11 Thread Nick Dimiduk
I believe "transaction log" == WAL.


On Fri, Jul 11, 2014 at 9:57 AM, Jean-Marc Spaggiari <
jean-m...@spaggiari.org> wrote:

> Interesting. What do they mean by "HBase transaction log files"? Do they
> talk about a transactional framework? Or about the WALs/HFiles???
>
>
> 2014-07-11 12:22 GMT-04:00 Nick Dimiduk :
>
> > FYI. It looks like the Microsoft folks want to make Azue the definitive
> > cloud on which to run HBase.
> >
> > -n
> >
> > https://issues.apache.org/jira/browse/HADOOP-10809
> >
>


Re: build a hbase client only packge

2014-07-16 Thread Nick Dimiduk
You need to build the client? Is it not sufficient to add a dependency on
the client library to your project? Maven/Ivy should sort the rest for you.


On Wed, Jul 16, 2014 at 1:24 AM, Ted Yu  wrote:

> I looked at top level pom.xml
> I didn't find such a profile.
>
> Cheers
>
> On Jul 15, 2014, at 4:41 PM, Jerry He  wrote:
>
> > Hi, folk
> >
> > Is there a HBase mvn build target to build a package for a HBase
> standalone
> > client, including the hbase-client, hbase-common, and dependencies that
> are
> > required by a working client?
> >
> > Thanks in advance.
> >
> > Jerry
>


Re: build a hbase client only packge

2014-07-16 Thread Nick Dimiduk
What does "client package" include? Just the java client library? The
shell? REST/Thrift gateways?


On Wed, Jul 16, 2014 at 3:11 PM, Jerry He  wrote:

> Thanks for the reply, Ted, Nick.
> As Nick said, with maven/ivy and access to repositories, people can build
> and run their client applications.
>
> Exploring options to provide a client package download for our users.
>
>
> On Wed, Jul 16, 2014 at 8:28 AM, Nick Dimiduk  wrote:
>
> > You need to build the client? Is it not sufficient to add a dependency on
> > the client library to your project? Maven/Ivy should sort the rest for
> you.
> >
> >
> > On Wed, Jul 16, 2014 at 1:24 AM, Ted Yu  wrote:
> >
> > > I looked at top level pom.xml
> > > I didn't find such a profile.
> > >
> > > Cheers
> > >
> > > On Jul 15, 2014, at 4:41 PM, Jerry He  wrote:
> > >
> > > > Hi, folk
> > > >
> > > > Is there a HBase mvn build target to build a package for a HBase
> > > standalone
> > > > client, including the hbase-client, hbase-common, and dependencies
> that
> > > are
> > > > required by a working client?
> > > >
> > > > Thanks in advance.
> > > >
> > > > Jerry
> > >
> >
>


Re: build a hbase client only packge

2014-07-16 Thread Nick Dimiduk
hbase-shell is a separate module in the java build (as is hbase-client).
Are you thinking in terms java or of bigtop packaging here?


On Wed, Jul 16, 2014 at 4:18 PM, Jerry He  wrote:

> Hi, Nick
>
> If there is a way to include the hbase shell in a 'client only'
> package/assembly, that will be great.
> For the  REST/Thrift gateways, there are probably people who want to have
> separate packages for them to install them. But not on my mind yet.
>
>
>
> On Wed, Jul 16, 2014 at 3:44 PM, Nick Dimiduk  wrote:
>
> > What does "client package" include? Just the java client library? The
> > shell? REST/Thrift gateways?
> >
> >
> > On Wed, Jul 16, 2014 at 3:11 PM, Jerry He  wrote:
> >
> > > Thanks for the reply, Ted, Nick.
> > > As Nick said, with maven/ivy and access to repositories, people can
> build
> > > and run their client applications.
> > >
> > > Exploring options to provide a client package download for our users.
> > >
> > >
> > > On Wed, Jul 16, 2014 at 8:28 AM, Nick Dimiduk 
> > wrote:
> > >
> > > > You need to build the client? Is it not sufficient to add a
> dependency
> > on
> > > > the client library to your project? Maven/Ivy should sort the rest
> for
> > > you.
> > > >
> > > >
> > > > On Wed, Jul 16, 2014 at 1:24 AM, Ted Yu  wrote:
> > > >
> > > > > I looked at top level pom.xml
> > > > > I didn't find such a profile.
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Jul 15, 2014, at 4:41 PM, Jerry He  wrote:
> > > > >
> > > > > > Hi, folk
> > > > > >
> > > > > > Is there a HBase mvn build target to build a package for a HBase
> > > > > standalone
> > > > > > client, including the hbase-client, hbase-common, and
> dependencies
> > > that
> > > > > are
> > > > > > required by a working client?
> > > > > >
> > > > > > Thanks in advance.
> > > > > >
> > > > > > Jerry
> > > > >
> > > >
> > >
> >
>


HFileLink backreferences

2014-07-23 Thread Nick Dimiduk
Heya,

I see that we maintain backreferences for hfilelinks. This appears to be
used by HFileLinkCleaner to determine when an HFile has no HFileLinks and
thus whether it can be deleted without orphaning those links.

This is problematic for using the mapreduce over snapshot files feature.
However, restoring a snapshot can create HFileLinks to existing files in
the restore directory. Those links then create back-references to the root
path. Thus we have a situation where the user running the MR job requires
write access to the hbase root path.

Was this already discussed in the original ticket (HBASE-8369)? We mention
the requirement of read permissions (and bypassing security) in the release
note, but I didn't note any comments for write. Requiring write permission
effectively means you can only MR as the hbase user, which is pretty much a
non-starter for any interesting integration of feature.

Thoughts?

Thanks,
Nick


Update mirror HEADER.htm

2014-07-31 Thread Nick Dimiduk
Just noticed, the header we push out to mirrors has old advice regarding
which release is stable.

http://apache.spinellicreations.com/hbase/HEADER.html

Is updating this a simple matter of updating the file on people.apache.org:
/www/www.apache.org/dist/hbase/HEADER.html ?

Thanks,
Nick


Re: adding a profile to build thrift generated classes

2014-07-31 Thread Nick Dimiduk
So long as you can enforce that the thrift binary version matches whatever
the pom.xml requires, it's fine by me. You want to cook up a patch, Sean?


On Wed, Jul 30, 2014 at 11:12 PM, Srikanth Srungarapu  wrote:

> +1 on the idea. Currently, there is a profile for generating protobuf files
> (http://hbase.apache.org/book/build.html), but I couldn't find any analog
> for thrift. So, this can come handy when needed.
>
>
> On Wed, Jul 30, 2014 at 10:47 PM, Sean Busbey  wrote:
>
> > Hiya!
> >
> > Currently, the only instructions for generating our RPC classes via
> thrift
> > that I can find are in a package javadoc[1].
> >
> > While I doubt we rebuild them very often, I was thinking we could ease
> the
> > process by adding a maven profile that took care of the rebuilding and
> > copying into place.
> >
> > It would still require having thrift installed on the build system, so it
> > would need to be disabled by default.
> >
> > Wanted to check for concerns / objections before filing a jira.
> >
> > [1]: http://s.apache.org/hZv
> >
> > --
> > Sean
> >
>


Re: [DISCUSSION] applying patches

2014-08-01 Thread Nick Dimiduk
Can we enforce the use of --signoff with a hook? If author email isn't @
apache.org then signoff stamp is required, something like this?


On Fri, Aug 1, 2014 at 10:53 AM, Mike Drob  wrote:

> It's a little more work, but without the sign-off line you can still find
> the committer by doing "git log --formate=full"
>
> Mike
>
>
> On Fri, Aug 1, 2014 at 12:43 PM, Andrew Purtell 
> wrote:
>
> > I noticed that commit and just sent something to private@ :-) I think
> it's
> > a fine practice, but there is no sign off line on that commit so the
> > committer is not apparent. As long as we can know who committed the patch
> > at a glance it sounds good to me.
> >
> >
> > On Fri, Aug 1, 2014 at 10:38 AM, Stack  wrote:
> >
> > > I just committed a message with 'git am' because the author took the
> > > trouble to write a sweet commit message. Others have been taking the
> > > trouble to write useful commit messages but up to this I've been just
> > > applying patches with patch with a commit message that is the issue
> > number,
> > > subject, and author only rather than git apply or git am.
> > >
> > > On the tail of HBASE-4593, Misty is looking for clarification.
> > >
> > > I suggest that if contributor wrote a nice commit message that leads
> off
> > > with issue number and issue subject, going forward, we preserve their
> > work
> > > and apply using git am --signoff?
> > >
> > > You cowboys and cowgirls have any opinions?
> > > St.Ack
> > >
> >
> >
> >
> > --
> > Best regards,
> >
> >- Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>


Re: [DISCUSSION] applying patches

2014-08-01 Thread Nick Dimiduk
Apparently the signoff feature was added for the benefit of Linux's
"Certificate of Origin" requirements. By using the same mechanism, are we
implying that we follow the same guidelines in its use? I don't think it's
common for a committer to assert the OSS validity of the contribution, or
that the contributor has completed the Apache ICLA.

http://gerrit.googlecode.com/svn/documentation/2.0/user-signedoffby.html


On Fri, Aug 1, 2014 at 11:01 AM, Nick Dimiduk  wrote:

> Can we enforce the use of --signoff with a hook? If author email isn't @
> apache.org then signoff stamp is required, something like this?
>
>
> On Fri, Aug 1, 2014 at 10:53 AM, Mike Drob  wrote:
>
>> It's a little more work, but without the sign-off line you can still find
>> the committer by doing "git log --formate=full"
>>
>> Mike
>>
>>
>> On Fri, Aug 1, 2014 at 12:43 PM, Andrew Purtell 
>> wrote:
>>
>> > I noticed that commit and just sent something to private@ :-) I think
>> it's
>> > a fine practice, but there is no sign off line on that commit so the
>> > committer is not apparent. As long as we can know who committed the
>> patch
>> > at a glance it sounds good to me.
>> >
>> >
>> > On Fri, Aug 1, 2014 at 10:38 AM, Stack  wrote:
>> >
>> > > I just committed a message with 'git am' because the author took the
>> > > trouble to write a sweet commit message. Others have been taking the
>> > > trouble to write useful commit messages but up to this I've been just
>> > > applying patches with patch with a commit message that is the issue
>> > number,
>> > > subject, and author only rather than git apply or git am.
>> > >
>> > > On the tail of HBASE-4593, Misty is looking for clarification.
>> > >
>> > > I suggest that if contributor wrote a nice commit message that leads
>> off
>> > > with issue number and issue subject, going forward, we preserve their
>> > work
>> > > and apply using git am --signoff?
>> > >
>> > > You cowboys and cowgirls have any opinions?
>> > > St.Ack
>> > >
>> >
>> >
>> >
>> > --
>> > Best regards,
>> >
>> >- Andy
>> >
>> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
>> > (via Tom White)
>> >
>>
>
>


Re: what label is used for issues that are good on ramps for new contributors?

2014-08-05 Thread Nick Dimiduk
Is "noob" really "pejorative"? If you must change it, I prefer "beginner".

I do like having a distinction separate from the severity of the ticket;
the criticality is orthogonal to how complex or involved it's solution
might be.


On Tue, Aug 5, 2014 at 11:27 AM, Jonathan Hsieh  wrote:

> +1 for beginner (+0.5 for all the others).
>
> Jon.
>
>
> On Tue, Aug 5, 2014 at 10:55 AM, Mikhail Antonov 
> wrote:
>
> > "Could a non-native English speaker comment on if either makes more sense
> > (or if something else would be better still)?"
> >
> > "beginner" or "for_beginners" would be good IMO.
> >
> > -Mikhail
> >
> >
> > 2014-08-05 8:43 GMT-07:00 Sean Busbey :
> >
> > > The labels "introductory" or "intro" seem the most straightforward to
> me.
> > >
> > > Could a non-native English speaker comment on if either makes more
> sense
> > > (or if something else would be better still)?
> > >
> > > -Sean
> > >
> > >
> > > On Tue, Aug 5, 2014 at 10:35 AM, Andrew Purtell 
> > > wrote:
> > >
> > > > Yes we can change that. I volunteer to change it. What should the new
> > > label
> > > > be?
> > > >
> > > >
> > > > On Tue, Aug 5, 2014 at 7:18 AM, Jonathan Hsieh 
> > wrote:
> > > >
> > > > > We've just used 'noob', and in other cases marked the issue as
> > > 'trivial'.
> > > > >
> > > > > If we really wanted to remove any perceived stigma away we'd called
> > > them
> > > > > 'starter' or 'intro' issue to remove negative connotation.  If
> others
> > > > agree
> > > > > we can chage this.
> > > > >
> > > > > Jon.
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Aug 5, 2014 at 6:43 AM, Sean Busbey 
> > > wrote:
> > > > >
> > > > > > I'd like to add to the "Getting Involved" section on jira[1] to
> > add a
> > > > > > pointer to issues that are a good for ramping up on HBase
> > > development.
> > > > > >
> > > > > > Preferably, these would be tickets that
> > > > > >
> > > > > > * Have been vetted as desired and non-controversial
> > > > > > * Are low priority so they need not be done under time pressure
> > > > > > * Offer a view of some (small-ish) part of the code base
> > > > > > * Require minimal information about HBase outside of the section
> to
> > > be
> > > > > > fixed
> > > > > >
> > > > > > Before I add file a ticket and put up a patch, what label do we
> use
> > > to
> > > > > mark
> > > > > > these tickets? Looking at jira, I can see some evidence of both
> > > > > "newbie"[2]
> > > > > > and "noob"[3].
> > > > > >
> > > > > > I am used to seeing "newbie" on other projects, so I had started
> > with
> > > > > that
> > > > > > label. However, it looks like "noob" is used more regularly on
> > HBase.
> > > > > >
> > > > > > I worry that "noob" might be viewed as pejorative and discourage
> > some
> > > > > > potential contributors.
> > > > > >
> > > > > > [1]: http://hbase.apache.org/book.html#jira
> > > > > > [2]: http://s.apache.org/exr
> > > > > > [3]: http://s.apache.org/XS6
> > > > > >
> > > > > > --
> > > > > > Sean
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > // Jonathan Hsieh (shay)
> > > > > // HBase Tech Lead, Software Engineer, Cloudera
> > > > > // j...@cloudera.com // @jmhsieh
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > >
> > > >- Andy
> > > >
> > > > Problems worthy of attack prove their worth by hitting back. - Piet
> > Hein
> > > > (via Tom White)
> > > >
> > >
> > >
> > >
> > > --
> > > Sean
> > >
> >
> >
> >
> > --
> > Thanks,
> > Michael Antonov
> >
>
>
>
> --
> // Jonathan Hsieh (shay)
> // HBase Tech Lead, Software Engineer, Cloudera
> // j...@cloudera.com // @jmhsieh
>


[VOTE] The 1st hbase 0.94.22 release candidate is available for download

2014-08-07 Thread Nick Dimiduk
- verified md5 [✓]
- verified asc [✓]
- build and test default profile with Oracle 1.6.0_31 [✓]
Tests run: 926, Failures: 0, Errors: 0, Skipped: 2
- build and test default profile with Oracle 1.7.0_21 [✓]
Tests run: 926, Failures: 0, Errors: 0, Skipped: 2
- Builds vs hadoop-2.4.0 completes but I get test failure. I see that
version is "Not Tested" in our support matrix, so [✓]

This machine is kind if a mess and I can't get hadoop-1 working to test the
RC properly. I'm headed out if town and I won't have more time before the
deadline, but I thought I'd report my findings. No issues from my side but
I can't really sign off on it either, so +0 (non-binding anyway, we require
PMC, right?) from me.

-n

On Wed, Aug 6, 2014 at 11:25 AM, Andrew Purtell > wrote:

> +1
>
> - Hash matches tarball
> - Signature validates
> - Launched an all-localhost cluster and ran LoadTestTool up to 1M rows, no
> errors or items of concerns observed in the logs, using 7u65
> - Built from source using the Hadoop 2 profile
> - RAT checks pass
> - Unit tests pass locally, using 7u65
>
>
>
>
> On Sat, Aug 2, 2014 at 9:28 AM, lars hofhansl  > wrote:
>
> > The 1st 0.94.22 RC is available for download at
> > http://people.apache.org/~larsh/hbase-0.94.22-rc0/
> > Signed with my code signing key: C7CFE328
> >
> > HBase 0.94.22 is a small bug fix release with 13 fixes:
> > [HBASE-10645] - Fix wrapping of Requests Counts Regionserver level
> > metrics
> > [HBASE-11360] - SnapshotFileCache causes too many cache refreshes
> > [HBASE-11479] - SecureConnection can't be closed when SecureClient is
> > stopping because InterruptedException won't be caught in
> > SecureClient#setupIOstreams()
> > [HBASE-11496] - HBASE-9745 broke cygwin CLASSPATH translation
> > [HBASE-11552] - Read/Write requests count metric value is too short
> > [HBASE-11565] - Stale connection could stay for a while
> > [HBASE-11633] - [0.94] port HBASE-11217 Race between SplitLogManager
> > task creation + TimeoutMonitor
> > [HBASE-2217] - VM OPTS for shell only
> > [HBASE-7910] - Dont use reflection for security
> > [HBASE-11444] - Remove use of reflection for User#getShortName
> > [HBASE-11450] - Improve file size info in SnapshotInfo tool
> > [HBASE-11480] - ClientScanner might not close the HConnection created
> > in construction
> > [HBASE-11623] - mutateRowsWithLocks might require
> updatesLock.readLock
> > with waitTime=0
> >
> > TThe list of changes is also available here:
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753&version=12327117
> >
> > Here's the test run for this RC:
> > https://builds.apache.org/job/HBase-0.94.22/34/
> >
> > Please try out the RC, check out the doc, take it for a spin, etc, and
> > vote +1/-1 by EOD August 8th on whether we should release this as
> 0.94.22.
> >
> > Thanks.
> >
> > -- Lars
>
>
>
>
> --
> Best regards,
>
>- Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>


Re: blockcache 101

2014-08-09 Thread Nick Dimiduk
Nice work!

On Friday, August 8, 2014, Stack  wrote:

> Here is a follow up to Nick's blockcache 101 that compares a number of
> deploys x loadings and makes recommendation:
> https://blogs.apache.org/hbase/
> St.Ack
>
>
> On Fri, Apr 4, 2014 at 9:22 PM, Stack >
> wrote:
>
> > Nick:
> >
> > + You measure 99th percentile.  Did you take measure of average/mean
> > response times doing your blockcache comparison?  (Our LarsHofhansl had
> it
> > that that on average reads out of bucket cache were a good bit slower).
>  Or
> > is this a TODO?
> > + We should just remove slabcache because bucket cache is consistently
> > better and why have two means of doing same thing?  Or, do you need more
> > proof bucketcache subsumes slabcache?
> >
> > Thanks boss,
> > St.Ack
> >
> >
>


Re: [DISCUSS] Hadoop version compatibility table

2018-08-21 Thread Nick Dimiduk
Hi Josh,

I'd prefer we maintain the simple "traffic light" labels for our
compatibility. I'm confused about what the difference might be between "NP"
and "X". I'm also confused how there can be any label with less
functionality than "X".

-n

On Tue, 21 Aug 2018 at 08:12 Josh Elser  wrote:

> Mich T had cross-posted a question to users@{hbase,phoenix} the only
> day. After some more information, we were able to find out that Mich was
> trying to use Hadoop 3.1 with HBase 1.2.6
>
> After pointing Mich to the compatibility table[1], I was about to puff
> out my chest and say "look! this table could've told you that HBase
> 1.2.6 wouldn't work with Hadoop 3.x!"
>
> But, then I realized that we don't have a single entry for HBase that
> implies it would even work for Hadoop 3. We presently have the following:
>
>"S" = supported
>"X" = not supported
>"NT" = Not tested
>
> I propose that would should add another "value" for cells in the table
> to better represent "Works, but not battle-tested" or similar. That
> would make possible values:
>
>"S" = supported
>"NP" = not production ready
>"X" = not supported
>"NT" = not tested
>
> Furthermore, the word "supported" drives me up a wall (as I think it
> implies the wrong mindset for an open source community), and I would
> rather see "functional". e.g.
>
>"F" = Fully functional, production ready
>"NP" = Functional, but not production ready/has known issues
>"X" = Not functional, lacking basic ability
>"NT" = Not tested, functionality is unknown
>
> Thoughts? Things that I've missed?
>
> - Josh
>
> [1] http://hbase.apache.org/book.html#hadoop
>


Re: [DISCUSS] Release cadence for HBase 2.y

2018-11-08 Thread Nick Dimiduk
This is an important topic. Thanks for bringing it up.

For what it’s worth, I found the “release train” to work pretty well for
patch releases from 1.1. That was only possible because of the stability of
that branch. After the first couple releases, devs were pretty good about
honoring the “bug fixes only” mandate. It was sometimes a challenge to
summon a voting PMC quorum for each release. I also felt like I was
sometimes stepping on the toes of other RM’s to attract said attention. I
think a monthly cadance from multiple release branches would be too much
burden of manual effort on our PMC. I’d like to see more automation for the
bulk of the voting process, even giving our automation a binding vote.

Thanks,
Nick

On Thu, 8 Nov 2018 at 15:28 Josh Elser  wrote:

> I think what I'd be concerned about WRT time-based releases is the
> burden on RM to keep the branch in a good state. Perhaps we need to not
> push that onto an RM and do better about sharing that load (looking in
> the mirror).
>
> However, I do like time-based releases as a means to avoid "hurt
> feelings" (e.g. the personal ties of a developer to a feature. "The
> release goes out on /yy/xx, this feature is not yet ready, can go
> out one month later.." etc)
>
> On 11/7/18 2:31 PM, Sean Busbey wrote:
> > Hi folks!
> >
> > Some time ago we talked about trying to get back on track for a more
> > regular cadence of minor releases rather than maintenance releases
> > (like how we did back pre-1.0). That never quite worked out for the
> > HBase 1.y line, but is still something we could make happen for HBase
> > 2.
> >
> > We're coming up on 4 months since the 2.1 release line started. ATM
> > there are 63 issues in JIRA that claim to be in 2.2.0 and not in any
> > 2.1.z version[1].
> >
> > The main argument against starting to do a 2.2.0 release is that
> > nothing springs out of that list as a "feature" that would entice
> > users to upgrade. Waiting for these kinds of selling points to drive a
> > release is commonly referred to as "feature based releases." I think
> > it would be fair to characterize the HBase 2.0 release as feature
> > based centered on AMv2.
> >
> > An alternative to feature based releases is date based releases where
> > we decide that e.g. we'll have a minor release each month regardless
> > of how much is included in it. This is sometimes also called "train
> > releases" as an analogy to how trains leave a station on a set
> > schedule without regard to which individual passengers are ready. Just
> > as you'd catch the next scheduled train if you miss-timed your
> > arrival, fixes or features that aren't ready just go in the next
> > regular release.
> >
> > Personally, I really like the idea of doing date based releases for
> > minor releases with maintenance releases essentially only happening on
> > whatever our "stable" designator points at. It would mean those who
> > don't want the risk and benefits of our current release-ready work
> > could stay on a defined path while we could move away from maintaining
> > a ton of branches, some of which don't even see releases (currently ~3
> > that are > 3 months since a release). If some folks had a specific
> > need for a different minor release line and were willing to do the
> > backport and RM work for that line, they'd of course be free to do so.
> >
> > I know there are some current unknowns around 2.2 specifically. I
> > think stack mentioned to me that there's an upgrade consideration that
> > we need to hammer out since I don't see anything specific to 2.2 in
> > the "Upgrade Paths" section of the ref guide right now. While I am
> > interested in getting 2.2 going specifically, I'd like to make sure we
> > address the general topic of regularly getting new minor releases out.
> > If we already had an expectation that there'd be a minor release every
> > e.g. month or 2 months then I expect whatever upgrade issue would have
> > been addressed as a part of the change that caused it going in.
> >
> > What do folks think?
> >
> > [1]:
> > https://s.apache.org/AAma
> >
>


Re: Rolling 2.1.2 and 2.0.4

2018-12-06 Thread Nick Dimiduk
For security-related deprecations, we'll publish a new release of the same
version, with an additional version number appended. For instance, a
critical bug in x.y.z would cause its deprecation, and a new x.y.z.1 be
released in its place, containing as a delta only the patch necessary to
resolve the critical issue.

-n

On Wed, Dec 5, 2018 at 9:56 PM Zach York 
wrote:

> Do we have a process for deprecating releases with critical bugs? I'm not
> sure this particular bug would warrant that, but I'm just curious if such a
> process is in place.
>
> On Wed, Dec 5, 2018, 9:46 PM Stack 
> > Let me put notice on download page too (Let me write the user-list).
> >
> > Wondering if anyone has a bit of time to take a look at the flakeys? For
> > 2.1, they cause a fail every other time:
> >
> >
> https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests/job/branch-2.0/lastSuccessfulBuild/artifact/dashboard.html
> >
> > Otherwise I'll try and take a look.
> >
> > Thanks,
> > St.Ack
> >
> > On Wed, Dec 5, 2018 at 9:34 PM Allan Yang  wrote:
> >
> > > According to @Zheng Hu, this memory leak was introduced by HBASE-20704
> > > , which will only
> > > affect
> > > the recently released version 2.0.3 and 2.1.1. I also think we need to
> > > notice those users.
> > > Best Regards
> > > Allan Yang
> > >
> > >
> > > Yu Li  于2018年12月6日周四 下午12:46写道:
> > >
> > > > Memory leak is critical enough to roll a new release, and maybe we
> > should
> > > > add a notice somewhere for 2.x users, like our ref guide and/or user
> > > > mailing list? Thanks.
> > > >
> > > > Best Regards,
> > > > Yu
> > > >
> > > >
> > > > On Thu, 6 Dec 2018 at 12:39, Reid Chan 
> wrote:
> > > >
> > > > > +1 for rolling.
> > > > >
> > > > > Nice found, Zheng Hu. (y)
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Best regards,
> > > > > R.C
> > > > >
> > > > >
> > > > >
> > > > > 
> > > > > From: Stack 
> > > > > Sent: 06 December 2018 12:03
> > > > > To: HBase Dev List
> > > > > Subject: Rolling 2.1.2 and 2.0.4
> > > > >
> > > > > 2.1.1 and 2.0.3 have an ugly bug found by our Zheng Hu. See
> > HBASE-21551
> > > > > Memory leak when use scan with STREAM at server side.
> > > > >
> > > > > S
> > > > >
> > > >
> > >
> >
>


Re: Nice article on one of our own....

2019-01-09 Thread Nick Dimiduk
Congratulations and thank you, Duo, for all your efforts!

On Wed, Jan 9, 2019 at 10:42 AM Stack  wrote:

> See #3 in list of top 5 Apache committers:
> https://www.cbronline.com/feature/apache-top-5
> S
>


Re: HBase Website Cleanup task

2019-02-04 Thread Nick Dimiduk
My only concern is that all my searches targeting our user guide land in
0.94 docs. I don't know why the SEO lands us thusly, but I fear that
suddenly it'll look to user searches like the project is dead.

On Mon, Feb 4, 2019 at 11:46 AM Misty Linville  wrote:

> +1
>
> On Mon, Feb 4, 2019 at 10:04 AM Stack  wrote:
>
> > Sounds good by me.
> > S
> >
> > On Mon, Feb 4, 2019 at 7:40 AM Peter Somogyi 
> wrote:
> >
> > > Does anyone have any concern with removing 0.94 documentation
> completely
> > > from hbase.apache.org? Currently it is hosted at
> > > https://hbase.apache.org/0.94/ to which we don't have any direct link
> > but
> > > with a few hops a user can reach this area. This release is almost 4
> > years
> > > old!
> > >
> > > I had a chat with Sean Busbey and he mentioned 0.94 could be removed
> from
> > > the site and we could send a NOTICE to user@ that the documentation
> can
> > be
> > > found in the final 0.94 tarball in case anyone still need it. The
> > > tarball[1] has the same content in docs directory.
> > >
> > > What do you think?
> > >
> > > Peter
> > >
> > > [1] http://archive.apache.org/dist/hbase/hbase-0.94.27/
> > >
> > > On Tue, Jan 29, 2019 at 10:18 PM Sakthi 
> > > wrote:
> > >
> > > > FYI: This is the job I'm talking about:
> > > > https://builds.apache.org/job/HBase%20Website%20Link%20Checker/
> > > >
> > > > Regards,
> > > > Sakthi
> > > >
> > > > On Tue, Jan 29, 2019 at 11:17 AM Sakthi 
> > > > wrote:
> > > >
> > > > > Hello all,
> > > > >
> > > > > The most recent (and many of the previous) "*HBase Website Link
> > > Checker*"
> > > > > job of ours has reported several issues with the website. The job
> > takes
> > > > > roughly 20+ hours to finish and has reported "*1458 missing files*"
> > and
> > > > "*7143
> > > > > missing named anchors*".
> > > > >
> > > > > *Quick facts:* Of the 1400+ missing files issues, around 1200 are
> > from
> > > > > 1.2, 90 from 0.94, 40 from 2.0, 30 from 2.1 and rest from the
> master.
> > > > >
> > > > > I have filed an *Umbrella JIRA:* HBASE-21803
> > > > >  ("HBase
> Website
> > > > > Cleanup"). If folks find any other website related issues that we
> > would
> > > > > like to address, we can use this umbrella to track it.
> > > > >
> > > > > Any suggestions are welcome!
> > > > >
> > > > > Regards,
> > > > > Sakthi
> > > > >
> > > >
> > >
> >
>


Broken Blog Images

2019-03-06 Thread Nick Dimiduk
Heya,

There's some nice posts up on the Apache blog, such as [0]. Sadly, it looks
like the image refs are broken. Refs are pointing off to some 3rd party
content hosting site. Maybe they can be updated to local-to-the-blog
content?

Thanks,
Nick

[0]: https://blogs.apache.org/hbase/entry/hbase_zk_less_region_assignment


Re: GetAndPut

2019-03-25 Thread Nick Dimiduk
Hi JM,

I wonder if your usecase is supported by the multi operation. If you enable
multiple cell versions on the store, you should be able to send a single
multi RPC containing first the CAX, then the multi-version get. The get
portion of the returned result would contain all the cell values you
require. (Be advised, this likely remains a source of data-race bugs when
you have multiple clients writing to a cell; I tend to stay away from CAX
operations for this reason.)

However, I’m not sure if multi support for CAX operations was ever
implemented.

Bonne chance,
-n

On Mon, Mar 25, 2019 at 12:23 PM Jean-Marc Spaggiari <
jean-m...@spaggiari.org> wrote:

> Well, I just don't want it to fail. I want to put a value that will replace
> the previous one and just return it (the previous one).
>
> It can be something like:
> repeat until success {
>   Get previous value
>   CheckAndPut(new value, previous value)
> }
> Then I know what I replaced by what.
>
> The usecase is where someone wants to keep track of what has been modified.
> A bit like a client side WAL. But they want that ONLY for updates. They
> don't care about new inserts. And since there is 99% inserts and only 1%
> updates, they don't want to just keep all puts.
>
> JMS
>
> Le lun. 25 mars 2019 à 15:16, Vladimir Rodionov  a
> écrit :
>
> > Interesting. If CheckAndPut succeeds, then you know the value and no need
> > for Get, right?
> > Only if it fail, you want to know current value if CheckAndPut fails?
> > Can you elaborate on your use case, Jean-Marc?
> >
> > -Vlad
> >
> > On Mon, Mar 25, 2019 at 11:54 AM Jean-Marc Spaggiari <
> > jean-m...@spaggiari.org> wrote:
> >
> > > Hi all,
> > >
> > > We have all CheckAndxxx operations, where we verify something and if
> the
> > > condition is true we perform the operatoin (Put, Delete, Mutation,
> etc.).
> > >
> > > I'm looking for a GetAndPut operation. Where in a single atomic call, I
> > can
> > > get the actual value of a cell (if any), and perform the put. Working
> on
> > a
> > > usecase where this might help.
> > >
> > > Do we have anything like that? I can simulate by doing a Get then a
> > > CheckAndPut, but that's 2 calls. Trying to save one call ;)
> > >
> > > Do we have anything like that?
> > >
> > > Thanks
> > >
> > > JMS
> > >
> >
>


Re: Are we ready to accept PR commit from github?

2019-03-27 Thread Nick Dimiduk
Can we have the gitbox message-per-github-action pipped to commits@ instead
of dev@ ?

On Tue, Mar 26, 2019 at 6:15 AM Josh Elser  wrote:

> Yep, the trigger to call Jenkins from Github PR's is something Infra
> controls. All I know is that I sent an email to us...@infra.apache.org
> explaining what I was trying to do, and an Infra team member said they
> "enabled two options" for me and then everything worked.
>
> Hopefully, if you just tell them "Trying to invoke ASF Jenkins from
> Github PR", that'll be enough to get them to do it for all of our GH
> repos :)
>
> On 3/25/19 10:40 PM, OpenInx wrote:
> > I see the guide in
> >
> https://cwiki.apache.org/confluence/display/INFRA/GitHub+Pull+Request+Builder,
>
> > I think we need create some
> > webhook in our apache/hbase github repository.
> >
> > Oh, maybe I can have a try in my own forked github repo, if work, then I
> > can put it in apache/hbase repo.
> >
> > On Tue, Mar 26, 2019 at 10:27 AM Sean Busbey  > > wrote:
> >
> > Access to Jenkins is granted by the PMC chair. When I'm back at a
> laptop
> > tomorrow I can post the guide for access.
> >
> >   You shouldn't need access to the setting of the GitHub repos to do
> > this;
> > I'm not even sure what gitbox integration gives us. I'll confirm by
> > checking how the two projects I know to have this set up (Hadoop and
> > Yetus).
> >
> > On Mon, Mar 25, 2019, 21:23 OpenInx  > > wrote:
> >
> >  > @stack boss, Can you  see the settings  also the jenkins config
> > ?  If so,
> >  > mind you grant those permissions?
> >  >
> >  > On Tue, Mar 26, 2019 at 10:17 AM 张铎(Duo Zhang)
> > mailto:palomino...@gmail.com>>
> >  > wrote:
> >  >
> >  > > I can not see the settings tab on github either...
> >  > >
> >  > > OpenInx mailto:open...@gmail.com>> 于2019年
> > 3月26日周二 上午10:11写道:
> >  > >
> >  > >> So the current problem is:  github PR can not trigger hadoop
> > QA ?  I
> >  > will
> >  > >> take some time to let this work, but
> >  > >> seems I've no setting permissions in github repo, also no
> > permission to
> >  > >> view the jenkins config.
> >  > >> @Sean Busby, @Josh Elser  > > @张铎(Duo Zhang)
> >  > >> mailto:palomino...@gmail.com>>  Could
> > you help to grant those permission to me
> >  > >> ?
> >  > >>
> >  > >> On Fri, Mar 15, 2019 at 11:28 PM Sean Busbey
> > mailto:bus...@apache.org>> wrote:
> >  > >>
> >  > >>> I suspect this blog post on getting Yetus checks against
> > GitHub PRs
> >  > >>> working will be more directly applicable:
> >  > >>>
> >  > >>>
> >  > >>>
> >  >
> >
> https://effectivemachines.com/2019/01/24/using-apache-yetus-with-jenkins-and-github-part-1/
> >  > >>>
> >  > >>> Of course, the examples Peter has gotten working on
> > hbase-connectors
> >  > >>> and hbase-operator-tools will probably be most applicable,
> > but IIRC
> >  > >>> they don't provide feedback directly on the github PR, so
> > contributors
> >  > >>> will have to click through to jenkins.
> >  > >>>
> >  > >>> On Fri, Mar 15, 2019 at 9:42 AM Josh Elser  > > wrote:
> >  > >>> >
> >  > >>> > Of course, found them right after I sent this :)
> >  > >>> >
> >  > >>> >
> >  > >>>
> >  >
> >
> https://lists.apache.org/thread.html/c441a64c28547ca405b65ad97a7dc78601d33149eb8fcc64aee88eeb@%3Cbuilds.apache.org%3E
> >  > >>> >
> >  > >>>
> >  >
> >
> https://lists.apache.org/thread.html/a21448239fe5dab8a486df72f032d84aade7425d1dac625b0410b63c@%3Cbuilds.apache.org%3E
> >  > >>> >
> >  > >>> > ChrisT gave me the necessary clarification to get things
> > working.
> >  > >>> > Hopefully between our current Jenkins automation and the
> > job I sent
> >  > >>> > below, it'd be enough to get something functional if you
> > have the
> >  > time
> >  > >>> > to devote to it :)
> >  > >>> >
> >  > >>> > On 3/15/19 10:40 AM, Josh Elser wrote:
> >  > >>> > > I went through this exercise recently, trying to get it
> > set up for
> >  > >>> Ratis:
> >  > >>> > >
> >  > >>> > > The result is at
> >  > >>> > >
> >  >
> https://builds.apache.org/job/PreCommit-RATIS-github-pull-requests/
> >  > >>> > >
> >  > >>> > > Infra has docs up at
> >  > >>> > >
> >  > >>>
> >  >
> >
> https://cwiki.apache.org/confluence/display/INFRA/GitHub+Pull+Request+Builder
> >  > >>> .
> >  > >>> > > I remember them not being 100% accurate but I'm having
> > trouble
> >  > >>> finding
> >  > >>> > > the thread where I was talking to them about this. Will
> > write back
> >  > >>> if I
> >  > >>> > > find it.
> >  > >>> > >
> >  > >>> > > On 3/15/19 12:14 AM, Yu Li wrote

Re: [VOTE] Send GitHub notification to issues@hbase

2019-03-27 Thread Nick Dimiduk
+1 from me, to be sure :)

Thanks Peter.

On Wed, Mar 27, 2019 at 12:29 PM Peter Somogyi  wrote:

> Hi,
>
> Right now notification emails from GitHub activities are sent to dev@hbase
> for https://github.com/apache/hbase repository. Nick Dimiduk had a
> suggestion that these should be forwarded to a different hbase mailing
> list.
>
> Since these kinds of emails are already sent to issues@hbase for the rest
> of our repositories (hbase-operator-tools, hbase-connectors and hbase-site)
> I recommend doing the same for hbase as well.
>
> Please vote whether you agree to make the change or you're against it.
>
> Thanks,
> Peter
>


Re: [DISCUSS] Reminder: please use 'rebase' or 'squash' when accepting PRs via github

2019-04-23 Thread Nick Dimiduk
I am +1 for linear commit history.

Does the “squash” option give the committer enough control over the commit
message format and structure? I personally prefer to perform all of my
commit rewriting locally, verify it locally, and then force-push the
desired history to the feature branch before using the “rebase” variant of
the button. Maybe that process is not necessary with the “squash” option?

Thanks,
Nick

On Tue, Apr 23, 2019 at 7:33 AM Sean Busbey  wrote:

> Folks,
>
> Looking at history for the current master branch, we had several PRs
> accepted over the last week where the committer used the "merge
> commit" option. As a reminder, previous consensus was that we would
> avoid this since it makes the history harder to follow.
>
> Please ensure you are selecting either "rebase commits" or "squash
> commits" when accepting a PR.
>
> I have filed INFRA-18264 to disable the merge commit option.
>


Re: [DISCUSS] Reminder: please use 'rebase' or 'squash' when accepting PRs via github

2019-04-23 Thread Nick Dimiduk
> I prefer multiple commits in a single PR compared to force pushing to
feature branch because it makes incremental reviewing simpler.

I agree incremental commits make reviewing simpler. However, we have a long
tradition of 1 JIRA, 1 commit. This makes it easier for release managers to
conduct what still falls to manual business of (1) manually verifying the
branch history and JIRA fixVersions and the CHANGES.txt all align and (2)
selectively reverting changes identifies as having caused issue with a
release candidate.

How to we weigh these competing concerns?

On Tue, Apr 23, 2019 at 6:21 PM Nick Dimiduk  wrote:
>
> > I am +1 for linear commit history.
> >
> > Does the “squash” option give the committer enough control over the
> commit
> > message format and structure? I personally prefer to perform all of my
> > commit rewriting locally, verify it locally, and then force-push the
> > desired history to the feature branch before using the “rebase” variant
> of
> > the button. Maybe that process is not necessary with the “squash” option?
> >
> > Thanks,
> > Nick
> >
> > On Tue, Apr 23, 2019 at 7:33 AM Sean Busbey  wrote:
> >
> > > Folks,
> > >
> > > Looking at history for the current master branch, we had several PRs
> > > accepted over the last week where the committer used the "merge
> > > commit" option. As a reminder, previous consensus was that we would
> > > avoid this since it makes the history harder to follow.
> > >
> > > Please ensure you are selecting either "rebase commits" or "squash
> > > commits" when accepting a PR.
> > >
> > > I have filed INFRA-18264 to disable the merge commit option.
> > >
> >
>


Re: Using libraries licensed under LGPL

2019-05-23 Thread Nick Dimiduk
My understanding of [0] is that various versions of LGPL are explicitly
prohibited from use as a dependency in Apache Foundation projects.

I’m not a lawyer.

[0]:
https://apache.org/legal/resolved.html#category-x

On Thu, May 23, 2019 at 7:18 AM 张铎(Duo Zhang)  wrote:

> HBase itself can not depend on libraries which are licensed under LGPL. A
> possible way is to create a new project, which depend on both HBase and the
> LGPL library?
>
> Toshihiro Suzuki  于2019年5月23日周四 下午10:03写道:
>
> > Hi folks!
> >
> > I'm building htop in HBASE-11062 now and using Lanterna library to make a
> > Unix top like user interface:
> > https://github.com/mabe02/lanterna
> >
> > Lanterna is a Java library allowing you to write easy semi-graphical user
> > interfaces in a text-only environment, very similar to the C library
> > curses.
> >
> > However, I found Lanterna library is licensed under the LGPL.
> > https://github.com/mabe02/lanterna/blob/master/License.txt
> >
> > According to the Apache website (
> > http://www.apache.org/legal/3party.html#criteriaandcategories), it looks
> > like LGPL License in incompatible with Apache License, but I'm not sure
> if
> > we should not use libraries licensed under LGPL.
> >
> > Could anyone advise me on it?
> >
> > Regards,
> > Toshi
> >
>


Re: [VOTE] First release candidate for HBase 2.1.6 is available for download

2019-08-21 Thread Nick Dimiduk
Maybe this RC is already sunk due to HBASE-22823, but FYI, the test added
on HBASE-22169 is failing consistently for me. I added a comment to the
jira with my diagnostics; I think the fix is quite simple.

On Wed, Aug 21, 2019 at 12:20 AM Sakthi  wrote:

> +1 (non-binding)
>
> Java version: Amazon Corretto 8 - Corretto-8.222.10.1 (build 1.8.0_222-b10)
>
>1. Checksums and signatures(both src and bin): OK
>2. Rat Check: OK
>3. Built from source: OK
>4. Basic Shell CRUD commands: OK
>5. Local Installation mode:
>   1. Web UI: OK
>   2. LTT with 1M rows: OK
>
> - Sakthi
>
> On Mon, Aug 19, 2019 at 12:08 PM Artem Ervits 
> wrote:
>
> > changing my vote to +1, recompiled with Hadoop 2.8 and was able to run
> > through some shell, MR, etc. Logs, UI look good.
> >
> > On Sun, Aug 18, 2019 at 12:41 PM Andrew Purtell <
> andrew.purt...@gmail.com>
> > wrote:
> >
> > > Like I said on the JIRA I’d approve a patch that splits the canary API
> we
> > > need from a tool chassis. We have used and contributed to Canary for a
> > long
> > > time and changing the annotation was simply acknowledging our use and
> > > continued contribution to this code. We can also just copy Canary into
> > our
> > > monitoring tool suite and fork it but prefer to continue contributing
> > > canary improvements to open source.
> > >
> > > > On Aug 18, 2019, at 6:03 AM, 张铎(Duo Zhang) 
> > > wrote:
> > > >
> > > > Due to HBASE-22823 I plan to sink this RC and find a way to deal with
> > the
> > > > IA.Public Canary.
> > > >
> > > > OpenInx  于2019年8月18日周日 下午4:09写道:
> > > >
> > > >> See the previous failed UT, it says  address already bind. And run
> the
> > > UT
> > > >> several times under a new host again,
> > > >> seems all works fine.  So +1.
> > > >>
> > > >>> On Sun, Aug 18, 2019 at 3:56 PM OpenInx  wrote:
> > > >>>
> > > >>> Let me try the TestJMXConnectorServer locally again, although it
> > failed
> > > >>> now, it should not effect the rc vote.
> > > >>> Give my +1,  will take a look about that UT.
> > > >>>
> > >  On Sun, Aug 18, 2019 at 3:15 PM OpenInx 
> wrote:
> > > 
> > > 
> > > * Signature: ok
> > > * Checksum : ok
> > > * Rat check (1.8.0_202): ok
> > >  - mvn clean apache-rat:check
> > > * Built from source (1.8.0_202): ok
> > >  - mvn clean install -DskipTests
> > > * Unit tests pass (1.8.0_202): failed
> > >  - mvn test -P runAllTests
> > > 
> > >  The failed UT are:
> > > 
> > >  [INFO]
> > >  [ERROR] Errors:
> > >  [ERROR]
> > > 
> > > >>
> > >
> >
> org.apache.hadoop.hbase.TestJMXConnectorServer.testHMConnectorServerWhenShutdownCluster(org.apache.hadoop.hbase.TestJMXConnectorServer)
> > >  [ERROR]   Run 1:
> > > 
> TestJMXConnectorServer.testHMConnectorServerWhenShutdownCluster:154
> > »
> > > IO
> > >  Shutt...
> > >  [ERROR]   Run 2: TestJMXConnectorServer.tearDown:73 NullPointer
> > >  [INFO]
> > >  [ERROR]
> > > 
> > > >>
> > >
> >
> org.apache.hadoop.hbase.TestJMXConnectorServer.testHMConnectorServerWhenStopMaster(org.apache.hadoop.hbase.TestJMXConnectorServer)
> > >  [ERROR]   Run 1:
> > >  TestJMXConnectorServer.testHMConnectorServerWhenStopMaster:85 » IO
> > > >> Shutting
> > >  do...
> > >  [ERROR]   Run 2: TestJMXConnectorServer.tearDown:73 NullPointer
> > >  [INFO]
> > >  [ERROR]
> > > 
> > > >>
> > >
> >
> org.apache.hadoop.hbase.TestJMXConnectorServer.testRSConnectorServerWhenStopRegionServer(org.apache.hadoop.hbase.TestJMXConnectorServer)
> > >  [ERROR]   Run 1:
> > > 
> > TestJMXConnectorServer.testRSConnectorServerWhenStopRegionServer:123 »
> > > >> IO
> > >  Shut...
> > >  [ERROR]   Run 2: TestJMXConnectorServer.tearDown:73 NullPointer
> > >  [INFO]
> > >  [WARNING] Flakes:
> > >  [WARNING]
> > > 
> > > >>
> > >
> >
> org.apache.hadoop.hbase.master.TestRestartCluster.testRetainAssignmentOnRestart(org.apache.hadoop.hbase.master.TestRestartCluster)
> > >  [ERROR]   Run 1:
> > TestRestartCluster.testRetainAssignmentOnRestart:170
> > > »
> > >  Runtime Failed construc...
> > >  [INFO]   Run 2: PASS
> > >  [INFO]
> > >  [WARNING]
> > > 
> > > >>
> > >
> >
> org.apache.hadoop.hbase.master.cleaner.TestSnapshotFromMaster.testAsyncSnapshotWillNotBlockSnapshotHFileCleaner(org.apache.hadoop.hbase.master.cleaner.TestSnapshotFromMaster)
> > >  [ERROR]   Run 1:
> > > 
> > > >>
> > >
> >
> TestSnapshotFromMaster.testAsyncSnapshotWillNotBlockSnapshotHFileCleaner:445
> > >  [INFO]   Run 2: PASS
> > >  [INFO]
> > >  [WARNING]
> > > 
> > > >>
> > >
> >
> org.apache.hadoop.hbase.replication.TestReplicationTrackerZKImpl.testGetListOfRegionServers(org.apache.hadoop.hbase.replication.TestReplicationTrackerZKImpl)
> > >  [ERROR]   Run 1:
> > >  TestReplicationTrackerZKImpl.testGetListOfRegionServers:130
> > > expected:<1>
> > > 

Re: [VOTE] Second release candidate for HBase 2.1.6 is available for download

2019-08-28 Thread Nick Dimiduk
+1

using this fancy new hbase-vote.sh:
* Signature: ok
* Checksum : ok
* Rat check (1.8.0_222): ok
 - mvn clean apache-rat:check
* Built from source (1.8.0_222): ok
 - mvn clean install -DskipTests
* Unit tests pass (1.8.0_222): failed
 - mvn package -P runAllTests

I have only been able to get about 1/10 full test runs to complete, largely
due to timeouts in test setup.

On Mon, Aug 26, 2019 at 7:54 AM Duo Zhang  wrote:

> The second release candidate for HBase 2.1.6 is available for download:
>
>   https://dist.apache.org/repos/dist/dev/hbase/2.1.6RC1
>
> Maven artifacts are also available in a staging repository at:
>
>   https://repository.apache.org/content/repositories/orgapachehbase-1331/
>
> Artifacts are signed with my key (9AD2AE49) published in our KEYS file at
>
>   http://www.apache.org/dist/hbase/KEYS
>
> The tag to be voted on is 2.1.6RC1:
>
>   https://github.com/apache/hbase/tree/2.1.6RC1
>
> HBase 2.1.6 is the seventh maintenance release in the HBase 2.1 line,
> continuing on the theme of bringing a stable, reliable database to the
> Hadoop and NoSQL communities. It fixes several CVEs related to jackson by
> upgrading jackson-databind dependency to 2.9.9.2, all hbase users are
> highly recommended to upgrade, especially the one who uses hbase-rest.
>
> 2.1.6 includes ~152 bug and improvement fixes done since the 2.1.5. There
> are several critical fixes around WAL, which may cause WAL corruption or
> hang the region server. Please see HBASE-22539, HBASE-22681, HBASE-22684
> for more details.
>
> The detailed source and binary compatibility report vs 2.1.5 has been
> published for your review, at:
>
>
>
> https://dist.apache.org/repos/dist/dev/hbase/2.1.6RC1/compatibility_report_2.1.5vs2.1.6RC1.html
>
> The report shows no incompatibilities.
>
> The full list of fixes included in this release is available in the
> CHANGES.md that ships as part of the release also available here:
>
>   https://dist.apache.org/repos/dist/dev/hbase/2.1.6RC1/CHANGES.md
>
> The RELEASENOTES.md are here:
>
>   https://dist.apache.org/repos/dist/dev/hbase/2.1.6RC1/RELEASENOTES.md
>
> Please try out this candidate and vote +1/-1 on whether we should release
> these artifacts as HBase 2.1.6.
>
> The VOTE will remain open for at least 72 hours.
>
> Thanks
>


Re: Flaky test and HBase-adhoc-run-tests job

2019-08-31 Thread Nick Dimiduk
Hi Sakthi,

I apologize for the delay of reply. I’m not familiar with the details of
this job, nor am I a recent participant in the community, so I was hoping
someone with more recent experience would answer these questions.

To the best of my knowledge, we have no objection to creating jira-specific
branches in the repo, there are plenty of examples of this in our past.
Usually they’re done as a means of multi-developer collaboration, but
that’s not by policy. I think what you propose is a fine use-case for
pushing a branch. Please observe good hygiene in the shared space and clean
up after yourself when finished.

Thanks,
Nick

On Thu, Aug 22, 2019 at 14:23 Sakthi  wrote:

> Hello,
>
> I would like to test the fix to one of the flakies (HBASE-22895). For that,
> I want to utilize our HBase-adhoc-run-tests job to run the test repeatedly
> (~50 times) with the fix and see if it helps before pushing in the fix. I
> see that currently we only allow any of the branches from the hbase repo to
> be used for the testing in the job.
>
> Does that mean that I can create a branch(HBASE-22895) in the repo, push
> the fix there, run the job & when the issue is rectified, push the fix &
> delete the branch? Or is creation of new ad_hoc branches in the repo not
> really necessary or isn't the right way?
>
> Would appreciate your suggestions.
>
> -Sakthi
>


Re: Failure: HBase Generate Website

2019-09-03 Thread Nick Dimiduk
Thanks for checking.

On Tue, Sep 3, 2019 at 08:11 Josh Elser  wrote:

> In case someone else was meaning to look at this:
>
> Seems to be transient. status.a.o doesn't go back far enough to see if
> this was an acknowledged issue.
>
> 
> Connect to repository.apache.org:80
> [repository.apache.org/207.244.88.140] failed: Connection timed out
> (Connection timed out)
> 
>
> 4 green builds since then.
>
> On 8/31/19 10:45 AM, Apache Jenkins Server wrote:
> > Build status: Failure
> >
> > The HBase website has not been updated to incorporate HBase commit
> ${CURRENT_HBASE_COMMIT}.
> >
> > See https://builds.apache.org/job/hbase_generate_website/1775/console
> >
>


Re: [VOTE] The first HBase Operator Tools 1.0.0 release candidate (RC0) is available

2019-09-18 Thread Nick Dimiduk
+1

 - Verified signatures.
 - Verified checksums.
 - Built from source tarball successfully.
 - Ran unit tests from source tarball, pass.
 - Ran rat check from source tarball, pass.


On Mon, Sep 16, 2019 at 11:31 AM Peter Somogyi  wrote:

> Please vote on this Apache HBase Operator Tools Release Candidate (RC),
> 1.0.0.
>
> The VOTE will remain open for at least 72 hours.
>
> [ ] +1 Release this package as Apache HBase Operator Tools 1.0.0
> [ ] -1 Do not release this package because ...
>
> The tag to be voted on is 1.0.0RC0:
>
>  https://github.com/apache/hbase-operator-tools/tree/1.0.0RC0
>
> The release files, including signatures, digests, as well as CHANGES.md
> and RELEASENOTES.md included in this RC can be found at:
>
>  https://dist.apache.org/repos/dist/dev/hbase/1.0.0RC0/
>
> Maven artifacts are available in a staging repository at:
>
>  https://repository.apache.org/content/repositories/orgapachehbase-1348/
>
> Artifacts were signed with the psomo...@apache.org key which can be found
> in:
>
>  https://dist.apache.org/repos/dist/release/hbase/KEYS
>
> To learn more about Apache HBase Operator Tools, please see
> http://hbase.apache.org/
>
> Thanks,
> Your HBase Release Manager
>


[DISCUSS] Multiplicity of repos and release tooling

2019-09-18 Thread Nick Dimiduk
Heya,

Looks like we have quite a few repos now, each of which must produce
artifacts that follow the Apache protocols. I also see we have some nice
tools built up in dev-support in the main repo for RM's who build release
candidates and community members to vote on them. I tried to use our
hbase-vote.sh on this operator-tools RC and found it mostly works. I think
a few adjustments on each end would allow these tools to work across repos,
so I filed HBASE-23048. However, I now see that operator-tools has its own
dev-support/create-release, so I wonder what direction we want to take with
this automation, so here I come to the list.

Do we want to have independent tooling for each repo? Are the processes of
building RC's different enough to warrant separate tools? Is a single tool
that can build an RC for all of them not worth the trouble? At the very
least, I think the hbase-vote.sh can be made to work with releases
generated from each of the repos, as it's not doing all that much.

Thoughts?

Thanks,
Nick

On Wed, Sep 18, 2019 at 9:42 AM Nick Dimiduk  wrote:

> +1
>
>  - Verified signatures.
>  - Verified checksums.
>  - Built from source tarball successfully.
>  - Ran unit tests from source tarball, pass.
>  - Ran rat check from source tarball, pass.
>
>
> On Mon, Sep 16, 2019 at 11:31 AM Peter Somogyi 
> wrote:
>
>> Please vote on this Apache HBase Operator Tools Release Candidate (RC),
>> 1.0.0.
>>
>> The VOTE will remain open for at least 72 hours.
>>
>> [ ] +1 Release this package as Apache HBase Operator Tools 1.0.0
>> [ ] -1 Do not release this package because ...
>>
>> The tag to be voted on is 1.0.0RC0:
>>
>>  https://github.com/apache/hbase-operator-tools/tree/1.0.0RC0
>>
>> The release files, including signatures, digests, as well as CHANGES.md
>> and RELEASENOTES.md included in this RC can be found at:
>>
>>  https://dist.apache.org/repos/dist/dev/hbase/1.0.0RC0/
>>
>> Maven artifacts are available in a staging repository at:
>>
>>  https://repository.apache.org/content/repositories/orgapachehbase-1348/
>>
>> Artifacts were signed with the psomo...@apache.org key which can be found
>> in:
>>
>>  https://dist.apache.org/repos/dist/release/hbase/KEYS
>>
>> To learn more about Apache HBase Operator Tools, please see
>> http://hbase.apache.org/
>>
>> Thanks,
>> Your HBase Release Manager
>>
>


Re: [DISCUSS] Multiplicity of repos and release tooling

2019-09-19 Thread Nick Dimiduk
On Wed, Sep 18, 2019 at 10:30 AM Sean Busbey  wrote:

> If the tooling is in one place where will it live?
>
> As an RM I like not needing to checkout more then the repo I'm trying to
> get a release out on.
>

My initial thinking was all would be in the main repo, but this is contrary
to your above statement. As I see it, everyone working on HBase has a
checkout of the main repo handy, so for releases spun on developer machines
it's "no big deal." If we can ever get to releases spun through automated
environments, it's all scripted anyway and thus "no big deal."

In particular my release machine is slow on disk and so updates to the main
> git repo when trying to release something not in the main repo will be
> painful.


"slow on disk" as in iops rate or "low on disk" as in capacity? Either way,
I'm surprised to hear about this as a barrier. Is there a side-conversation
we can have about trimming the fat out of our repo(s)? Some git maintenance
that can/needs to be done? I was recently shocked by the girth of our repos
-- nearly 0.5g on the main repo and a whopping 4g on the site!

For the most part this is also why I usually manually build RCs
> that I run for the main repo, because I can do a shallow clone of the
> release branch instead of needing to get updates to all the active
> branches.
>

This makes me sad. Automate more, not less! Altering the automation to make
shallow clones of targeted branches should be simple enough, no?

For testing RCs I guess it's currently all some combination of the
> foundation policies that should be the same and maybe maven profiles?
>

I agree that codification of foundation policy is the baseline. That would
be enough for me as a first pass. After that, perhaps layers of increasing
sophistication, perhaps repo-dependent. I don't follow your meaning re:
maven profiles.

Iirc there was already some confusion about using the testing script that
> came in the main repo source bundle vs the one on master.
>

This is a good point. I wonder if `dev-support` should be pruned or purged
from all branches other than master. Maybe the CI related stuff branches
into it's own directory or directories, and we keep everything else limited
to a single canonical copy on master. This strategy would eliminate
confusion re: what is authority in any given situation but perhaps is too
constraining, given the number of active release branches we maintain.

What issue are we trying to solve?
>

Minimize contributor friction. Make it easy for every subscriber to dev@ to
say "There's a new RC posted and I have an idle machine for an hour / for
the evening / for the weekend, let's just kick the tires." Make it easy for
everyone who's learned the RC tooling on one branch to pinch-hit in on
another branch or another repo. I hear a constant complaint of "scarcity of
volunteer hours" and "I wish we had more people voting in RCs" and "I wish
we could keep a monthly release cadence on every branch we're maintaining".
So I'd like to see a focused effort on maximizing the volunteer human and
machine time that's thrown our way.

Thanks,
Nick

On Wed, Sep 18, 2019, 11:50 Nick Dimiduk  wrote:
>
> > Heya,
> >
> > Looks like we have quite a few repos now, each of which must produce
> > artifacts that follow the Apache protocols. I also see we have some nice
> > tools built up in dev-support in the main repo for RM's who build release
> > candidates and community members to vote on them. I tried to use our
> > hbase-vote.sh on this operator-tools RC and found it mostly works. I
> think
> > a few adjustments on each end would allow these tools to work across
> repos,
> > so I filed HBASE-23048. However, I now see that operator-tools has its
> own
> > dev-support/create-release, so I wonder what direction we want to take
> with
> > this automation, so here I come to the list.
> >
> > Do we want to have independent tooling for each repo? Are the processes
> of
> > building RC's different enough to warrant separate tools? Is a single
> tool
> > that can build an RC for all of them not worth the trouble? At the very
> > least, I think the hbase-vote.sh can be made to work with releases
> > generated from each of the repos, as it's not doing all that much.
> >
> > Thoughts?
> >
> > Thanks,
> > Nick
> >
> > On Wed, Sep 18, 2019 at 9:42 AM Nick Dimiduk 
> wrote:
> >
> > > +1
> > >
> > >  - Verified signatures.
> > >  - Verified checksums.
> > >  - Built from source tarball successfully.
> > >  - Ran unit tests from source tarball, pass.
> > >  - Ran rat check from s

[DISCUSS] Limiting gitbox PR email

2019-09-28 Thread Nick Dimiduk
Heya,

I would like our dev@ subscription to gitbox notifications to match that of
our JIRA notifications — just open, close of issues. Right now we’re
getting every comment. Like JIRA, users are able to “watch” individual PRs
that they find to be of interest.

Any objections if I engage infra on making this change?

Thanks,
Nick


Re: [DISCUSS] Multiplicity of repos and release tooling

2019-09-28 Thread Nick Dimiduk
On Sat, Sep 28, 2019 at 10:58 Stack  wrote:

> hbase-operator-tools/create-release or a new repo altogether?
>

If we’re going to have one tool to release them all, I’m okay with it
staying in the main repo under dev-support or similar, so long as it’s on
only one branch (probably master, maybe it’s own). If the tools is itself
is going to be released and made installable (on a developer laptop or a
Jenkins worker) then I think it should have its own repo.

> On Wed, Sep 25, 2019 at 8:56 AM Peter Somogyi  wrote:
> > >
> > > It is great that you brought up this topic Nick! I agree that the
> optimal
> > > solution would be to host all the release related scripts (RC build,
> > > verifier) in a common place.
> > >
> > > The RC making scrip in hbase-operator-tools that Stack finetuned is
> meant
> > > to work with different artifacts. The current version there gives the
> RM
> > a
> > > smooth experience. It emerged from HBase's create-release script and
> > > hopefully it can be used on other releases as well. There are some
> > > constraints of the tool like Jira versions should use
> > > `hbase-operator-tools` prefix.
> > >
> > > > This is a good point. I wonder if `dev-support` should be pruned or
> > purged
> > > from all branches other than master
> > >
> > > When we create branch-3 out of master then this becomes a problem
> again.
> > > Would it work if we use a specific branch for such tooling (e.g.
> > > dev-tools)? In this case RMs can just clone that branch and don't need
> > the
> > > whole HBase repository on their local machine.
> > >
> > > On Thu, Sep 19, 2019 at 5:40 PM Nick Dimiduk 
> > wrote:
> > >
> > > > On Wed, Sep 18, 2019 at 10:30 AM Sean Busbey 
> > wrote:
> > > >
> > > > > If the tooling is in one place where will it live?
> > > > >
> > > > > As an RM I like not needing to checkout more then the repo I'm
> > trying to
> > > > > get a release out on.
> > > > >
> > > >
> > > > My initial thinking was all would be in the main repo, but this is
> > contrary
> > > > to your above statement. As I see it, everyone working on HBase has a
> > > > checkout of the main repo handy, so for releases spun on developer
> > machines
> > > > it's "no big deal." If we can ever get to releases spun through
> > automated
> > > > environments, it's all scripted anyway and thus "no big deal."
> > > >
> > > > In particular my release machine is slow on disk and so updates to
> the
> > main
> > > > > git repo when trying to release something not in the main repo will
> > be
> > > > > painful.
> > > >
> > > >
> > > > "slow on disk" as in iops rate or "low on disk" as in capacity?
> Either
> > way,
> > > > I'm surprised to hear about this as a barrier. Is there a
> > side-conversation
> > > > we can have about trimming the fat out of our repo(s)? Some git
> > maintenance
> > > > that can/needs to be done? I was recently shocked by the girth of our
> > repos
> > > > -- nearly 0.5g on the main repo and a whopping 4g on the site!
> > > >
> > > > For the most part this is also why I usually manually build RCs
> > > > > that I run for the main repo, because I can do a shallow clone of
> the
> > > > > release branch instead of needing to get updates to all the active
> > > > > branches.
> > > > >
> > > >
> > > > This makes me sad. Automate more, not less! Altering the automation
> to
> > make
> > > > shallow clones of targeted branches should be simple enough, no?
> > > >
> > > > For testing RCs I guess it's currently all some combination of the
> > > > > foundation policies that should be the same and maybe maven
> profiles?
> > > > >
> > > >
> > > > I agree that codification of foundation policy is the baseline. That
> > would
> > > > be enough for me as a first pass. After that, perhaps layers of
> > increasing
> > > > sophistication, perhaps repo-dependent. I don't follow your meaning
> re:
> > > > maven profiles.
> > > >
> > > > Iirc there was already some confusion about using the testing script
> > that
> > > > >

Re: [DISCUSS] Limiting gitbox PR email

2019-10-08 Thread Nick Dimiduk
After asking on users@infra, I've filed
https://issues.apache.org/jira/browse/INFRA-19243.

Thanks,
Nick

On Sat, Sep 28, 2019 at 7:02 PM Guanghao Zhang  wrote:

> +1. Too many emails now.
>
> 张铎(Duo Zhang)  于2019年9月29日周日 上午9:28写道:
>
> > +1
> >
> > Andrew Purtell  于2019年9月29日周日 上午9:08写道:
> >
> > > +1
> > >
> > > Didn’t realize this was an option for PRs.
> > >
> > > > On Sep 28, 2019, at 10:17 AM, Nick Dimiduk 
> > wrote:
> > > >
> > > > Heya,
> > > >
> > > > I would like our dev@ subscription to gitbox notifications to match
> > > that of
> > > > our JIRA notifications — just open, close of issues. Right now we’re
> > > > getting every comment. Like JIRA, users are able to “watch”
> individual
> > > PRs
> > > > that they find to be of interest.
> > > >
> > > > Any objections if I engage infra on making this change?
> > > >
> > > > Thanks,
> > > > Nick
> > >
> >
>


Re: unsubscribe me please from this mailing list

2019-10-22 Thread Nick Dimiduk
Hi Lucien,

All Apache mailing lists are self-managed [0]. In this particular case,
you’ll need to send a mail to unsubscribe-dev@.

Thanks,
Nick

[0]:
https://www.apache.org/foundation/mailinglists.html

On Tue, Oct 22, 2019 at 00:41 Lucien Wang  wrote:

> Please unsubscribe me
>
> Thanks
>


Re: Request to add Supporting Project

2019-10-28 Thread Nick Dimiduk
Hi Manu,

Thanks for the contribution. You can make the addition yourself: please
submit a PR :) The file that manages the page in question is
https://github.com/apache/hbase/blob/master/src/site/xdoc/supportingprojects.xml
.

Thanks,
Nick

On Sun, Oct 27, 2019 at 3:53 PM Manu M. 
wrote:

> Dear HBase team,
>
> We at Flipkart are using a light-weight ORM wrapper on top of hbase-client:
> https://github.com/flipkart-incubator/hbase-orm (open source, Apache 2.0
> license)
>
> It helps Java applications interact with HBase in an object-oriented manner
> (by encapsulating HBase's Get, Put, Delete, Append and other classes).
>
> It's published on Maven Central:
>
> https://search.maven.org/search?q=g:com.flipkart%20AND%20a:hbase-object-mapper&core=gav
>
> Requesting you to add the above project here:
> https://hbase.apache.org/supportingprojects.html
>
> We believe this will greatly help other software developers using Apache
> HBase by expediting their software development.
>
> --
> Manu M.
> Software Architect
> https://www.linkedin.com/in/manumanjunath/
>
> --
>
>
>
>
> *-*
>
>
> *This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom they are
> addressed. If you have received this email in error, please notify the
> system manager. This message contains confidential information and is
> intended only for the individual named. If you are not the named
> addressee,
> you should not disseminate, distribute or copy this email. Please notify
> the sender immediately by email if you have received this email by mistake
> and delete this email from your system. If you are not the intended
> recipient, you are notified that disclosing, copying, distributing or
> taking any action in reliance on the contents of this information is
> strictly prohibited.*
>
>  
>
> *Any views or opinions presented in this
> email are solely those of the author and do not necessarily represent
> those
> of the organization. Any information on shares, debentures or similar
> instruments, recommended product pricing, valuations and the like are for
> information purposes only. It is not meant to be an instruction or
> recommendation, as the case may be, to buy or to sell securities,
> products,
> services nor an offer to buy or sell securities, products or services
> unless specifically stated to be so on behalf of the Flipkart group.
> Employees of the Flipkart group of companies are expressly required not to
> make defamatory statements and not to infringe or authorise any
> infringement of copyright or any other legal right by email
> communications.
> Any such communication is contrary to organizational policy and outside
> the
> scope of the employment of the individual concerned. The organization will
> not accept any liability in respect of such communication, and the
> employee
> responsible will be personally liable for any damages or other liability
> arising.*
>
>  
>
> *Our organization accepts no liability for the
> content of this email, or for the consequences of any actions taken on the
> basis of the information *provided,* unless that information is
> subsequently confirmed in writing. If you are not the intended recipient,
> you are notified that disclosing, copying, distributing or taking any
> action in reliance on the contents of this information is strictly
> prohibited.*
>
>
>
> _-_
>
>


Re: how to connect to hbase through HTTPS proxy

2019-10-30 Thread Nick Dimiduk
Hi Bob,

To the best of my knowledge this is not possible using the standard HBase
client. The wire protocol is not HTTP, so all this goodness does not apply.

There is a “REST Gateway” that exposes an HBase cluster over HTTP.
Presumably you could proxy calls to a gateway instance (or a pool of
gateway instances). The architecture of the standard client differs from
the gateway setup: in the standard configuration the client will
communicate with region servers directly. Using the gateway results in the
gateway communicating with cluster members on behalf of the client.

https://hbase.apache.org/book.html#_rest

Thanks,
Nick

On Wed, Oct 30, 2019 at 15:31 bob doe  wrote:

> I'm currently using org.apache.hadoop.hbase.client.ConnectionFactory to
> connect to HBase:
>
>
> import rg.apache.hadoop.hbase.client.*;
> 
> Connection connection = ConnectionFactory.createConnection(conf);
> Table table = connection.getTable(TableName.valueOf("mytable"));
>
> If there is a HTTPS proxy in between the client and HBase server, is it
> possible to configure proxy host, port, basic authentication under
> Configuration object?
>
> I want to avoid applying proxy settings globally using -Dhttps.proxyHost
> and affect all HTTP connections.
>


[DISCUSS] Deprecate Review Board in favor of Github reviews

2019-11-13 Thread Nick Dimiduk
Heya,

Seems in the old days we were explicitly non-strict about where code review
were happening. I remember bouncing between Review Board and a Phabricator
instance (in addition to in-line patch reviews on JIRA). Now that we have
this fancy Gitbox and integration with GitHub, it seems we're making a
strong statement toward using Github PRs (in addition to in-line patch
reviews on JIRA) for our code review system. Is it worth while supporting
those older tools? I think maintaining the developer support tooling around
just these two mechanisms is plenty to keep up with.

I propose we make the move to Github PR's "official". This
basically involves updating the tome (here [0], here [1], probably others)
accordingly and sweeping the `dev-support` dir for old scripts.

Thoughts?

Thanks,
Nick

[0]: http://hbase.apache.org/book.html#developing
[1]: http://hbase.apache.org/book.html#reviewboard


Re: Requesting an Invite to the HBase Slack channel

2019-11-21 Thread Nick Dimiduk
Invitation sent.

On Thu, Nov 21, 2019 at 11:35 AM Narayanan PS 
wrote:

> I kindly request an invite to the HBase Slack channel (
> http://apache-hbase.slack.com/)
>
>
>
> --
> Regards,
> *Narayanan P S*
>


Re: Error Handling and Logging

2019-12-05 Thread Nick Dimiduk
I think these are good points all around. Can any of these anti-patterns be
flagged by a checkstyle rule? Static analysis would make the infractions
easier to track down.

One more point of my own: I’m of the opinion that we log too much in
general. Info level should not describe the details of operations as
normal. I’m also not a fan of logging data structure “status”messages, as
we do, for example, from the block cache. It’s enough to expose these as
metrics.

Thanks for speaking up! If you’re feeling ambitious, please ping me on any
PRs and we’ll get things cleaned up.

Thanks,
Nick

On Tue, Dec 3, 2019 at 21:02 Stack  wrote:

> Thanks for the helpful note David. Appreciated.
> S
>
> On Tue, Nov 26, 2019 at 1:44 PM David Mollitor  wrote:
>
> > Hello Team,
> >
> > I am one of many people responsible for supporting the Hadoop products
> out
> > in the field.  Error handling and logging are crucial to my success.
> I've
> > been reading over the code and I see many of the same mistakes again and
> > again.  I just wanted to bring some of these things to your attention so
> > that moving forward, we can make these products better.
> >
> > The general best-practice is:
> >
> > public class TestExceptionLogging
> > {
> >   private static final Logger LOG =
> > LoggerFactory.getLogger(TestExceptionLogging.class);
> >
> >   public static void main(String[] args) {
> > try {
> >   processData();
> > } catch (Exception e) {
> >   LOG.error("Application failed", e);
> > }
> >   }
> >
> >   public static void processData() throws Exception {
> > try {
> >   readData();
> > } catch (Exception e) {
> >   throw new Exception("Failed to process data", e);
> > }
> >   }
> >
> >   public static byte[] readData() throws Exception {
> > throw new IOException("Failed to read device");
> >   }
> > }
> >
> > Produces:
> >
> > [main] ERROR TestExceptionLogging - Application failed
> > java.lang.Exception: Failed to process data
> > at TestExceptionLogging.processData(TestExceptionLogging.java:22)
> > at TestExceptionLogging.main(TestExceptionLogging.java:12)
> > Caused by: java.io.IOException: Failed to read device
> > at TestExceptionLogging.readData(TestExceptionLogging.java:27)
> > at TestExceptionLogging.processData(TestExceptionLogging.java:20)
> > ... 1 more
> >
> >
> >
> > Please notice that when an exception is thrown, and caught, it is wrapped
> > at each level and each level adds some more context to describe what was
> > happening when the error occurred.  It also produces a complete stack
> trace
> > that pinpoints the issue.  For Hive folks, it is rarely the case that a
> > method consuming a HMS API call should itself throw a MetaException.  The
> > MetaException has no way of wrapping an underlying Exception and helpful
> > data is often loss.  A method may chooses to wrap a MetaException, but it
> > should not be throwing them around as the default behavior.
> >
> > Also important to note is that there is exactly one place that is doing
> the
> > logging.  There does not need to be any logging at the lower levels.  A
> > catch block should throw or log, not both.  This is an anti-pattern and
> > annoying as the end user: having to deal with multiple stack traces at
> > different log levels for the same error condition.  The log message
> should
> > be at the highest level only.
> >
> > https://community.oracle.com/docs/DOC-983543#logAndThrow
> >
> > Both projects use SLF4J as the logging framework (facade anyway).  Please
> > familiarize yourself with how to correctly log an Exception.  There is no
> > need to log a thread name, a time stamp, a class name, or a stack trace.
> > The logging framework will do that all for you.
> >
> > http://www.slf4j.org/faq.html#paramException
> >
> > Again, there is no need to 'stringify' an exception. For example, do not
> > use this:
> >
> >
> >
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/StringUtils.java#L86
> >
> >
> > If you do however want to dump a stack trace for debugging (or trace)
> > purposes, consider performing the following:
> >
> > if (LOG.isDebugEnabled()) {
> >   LOG.debug("Dump Thread Stack", new Exception("Thread Stack Trace (Not
> an
> > Error)"));
> > }
> >
> > Finally, I've seen it a couple of times in Apache project that enabling
> > debug-level logging causes the application to emit logs at other levels,
> > for example:
> >
> > LOG.warn("Some error occurred: {}", e.getMessage());
> > if (LOG.isDebugEnabled()) {
> >   LOG. warn("Dump Warning Thread Stack", e);
> > }
> >
> > Please refrain from doing this.  The inner log statement should be at
> DEBUG
> > level to match the check.  Otherwise, when I enable DEBUG logging in the
> > application, the expectation that I have is that I will have the exact
> > logging as the INFO level, but I will also have additional DEBUG details
> as
> > well.  I am going to be using 'grep' to find DEBUG a

Re: [DISCUSS] Deprecate Review Board in favor of Github reviews

2019-12-10 Thread Nick Dimiduk
Thanks all for the comments. I've filed
https://issues.apache.org/jira/browse/HBASE-23557 for hanging subtasks
related to this resolution.

Thanks,
Nick

On Wed, Nov 20, 2019 at 12:21 AM Mingliang Liu  wrote:

> bq. I propose we make the move to Github PR's "official". This
> basically involves updating the tome (here [0], here [1], probably others)
> accordingly and sweeping the `dev-support` dir for old scripts.
>
> +1 (non-binding) on this idea. Other than the doc and dev-support scripts,
> there is also a link to ReviewBoard in src/site/site.xml. We can put github
> over there I guess.
>
> On Thu, Nov 14, 2019 at 12:32 PM Zach York 
> wrote:
>
> > +1, much easier and available.
> >
> > On Thu, Nov 14, 2019 at 9:56 AM Geoffrey Jacoby 
> wrote:
> >
> > > +1 (non-binding), GitHub is a much better user experience, for both
> > > reviewers and contributors
> > >
> > > Geoffrey.
> > >
> > > On Thu, Nov 14, 2019 at 5:48 AM Guangxu Cheng 
> > > wrote:
> > >
> > > > +1
> > > > It will be more convenient to use Github PR.
> > > >
> > > > 张铎(Duo Zhang)  于2019年11月14日周四 下午6:17写道:
> > > >
> > > > > +1,  ReviewBoard is almost dead now as it is only available to
> > > > > committers...
> > > > >
> > > > > Jan Hentschel  于2019年11月14日周四
> > > 下午6:11写道:
> > > > >
> > > > > > +1
> > > > > >
> > > > > > I also like the GitHub way much more compared to ReviewBoard.
> > > > > >
> > > > > > From: Peter Somogyi 
> > > > > > Reply-To: "dev@hbase.apache.org" 
> > > > > > Date: Wednesday, November 13, 2019 at 6:23 PM
> > > > > > To: HBase Dev List 
> > > > > > Subject: Re: [DISCUSS] Deprecate Review Board in favor of Github
> > > > reviews
> > > > > >
> > > > > > +1
> > > > > >
> > > > > > Another issue with ReviewBoard is that it requires Apache ID so
> > only
> > > > > > committers are able to create new reviews or even comment.
> > > > > >
> > > > > > On Wed, Nov 13, 2019 at 5:21 PM Nick Dimiduk <
> ndimi...@apache.org
> > > > >  > > > > > ndimi...@apache.org>> wrote:
> > > > > >
> > > > > > Heya,
> > > > > >
> > > > > > Seems in the old days we were explicitly non-strict about where
> > code
> > > > > review
> > > > > > were happening. I remember bouncing between Review Board and a
> > > > > Phabricator
> > > > > > instance (in addition to in-line patch reviews on JIRA). Now that
> > we
> > > > have
> > > > > > this fancy Gitbox and integration with GitHub, it seems we're
> > making
> > > a
> > > > > > strong statement toward using Github PRs (in addition to in-line
> > > patch
> > > > > > reviews on JIRA) for our code review system. Is it worth while
> > > > supporting
> > > > > > those older tools? I think maintaining the developer support
> > tooling
> > > > > around
> > > > > > just these two mechanisms is plenty to keep up with.
> > > > > >
> > > > > > I propose we make the move to Github PR's "official". This
> > > > > > basically involves updating the tome (here [0], here [1],
> probably
> > > > > others)
> > > > > > accordingly and sweeping the `dev-support` dir for old scripts.
> > > > > >
> > > > > > Thoughts?
> > > > > >
> > > > > > Thanks,
> > > > > > Nick
> > > > > >
> > > > > > [0]: http://hbase.apache.org/book.html#developing
> > > > > > [1]: http://hbase.apache.org/book.html#reviewboard
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
> --
> L
>


Re: [apache/hbase] HBASE-23304: RPCs needed for client meta information lookup (#904)

2019-12-10 Thread Nick Dimiduk
Heya,

FYI, there are some new client-server RPC endpoints being proposed this is
patch. Probably this is of general interest. It's a subtask under
HBASE-18095: Provide an option for clients to find the server hosting META
that does not involve the ZooKeeper client

Thanks,
Nick

On Thu, Dec 5, 2019 at 6:11 PM Bharath Vissapragada <
notificati...@github.com> wrote:

> Next in the series of patches. This is a pretty simple one, just adds the
> needed RPCs and some sanity testing.
>
> @ndimiduk  @apurtell
>  @wchevreuil 
> @saintstack  @virajjasani
>  FYI, you might be interested in
> reviewing this.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> ,
> or unsubscribe
> 
> .
>


Re: [DISCUSS] Cut branch-2.3 and EOL branch-2.1

2020-01-03 Thread Nick Dimiduk
On Fri, Jan 3, 2020 at 00:42 Duo Zhang  wrote:

> After "HBASE-23326 Implement a ProcedureStore which stores procedures in a
> HRegion" and related issues are resolved, I think the last flaky part of
> the proc-v2 framework has been resolved.


23326 is quite a present for the new year!

So the plan here is to cut branch-2.3, and start to stablize it. And once
> 2.3.0 is out, I will do a final 2.1.x release and mark branch-2.1 as EOL.


+1

Suggestions are welcomed.


My primary suggestion is we need to make sure the rolling upgrade from 2.1
to 2.3 is as “turnkey” as possible. We already have the change to
persistent procedures from 2.2, which adds a hiccup. Let’s try to make this
more smooth for the under-staffed operations houses out there.

Oh, we need a release manager for branch-2.3. Any
> volunteers?


I volunteer to RM 2.3; I don’t mind taking another tour in the role.
Someone else speak up though if you feel you have a stronger claim.

Thanks,
Nick

>


Re: [DISCUSS] Bump hadoop versions

2020-01-03 Thread Nick Dimiduk
On Wed, Dec 25, 2019 at 5:38 PM 张铎(Duo Zhang)  wrote:

> We will only remove the hadoop 2.x support from hbase 3.x, which does not
> have a formal release plan yet, for 2.x we will still support hadoop 2.x.
>

Indeed there is no formal release plan for HBase-3.0, but I hope it's
sooner than 2+ years away! What's the motivation for dropping Hadoop2
support?

Wei-Chiu Chuang  于2019年12月26日周四 上午8:57写道:
>
> > With my Hadoop hat's on, we have not yet officially declared Hadoop 2.8
> > EOL. I think the 2.8 download missing from the web page is just a
> mistake.
> >
> > That being said, some of the biggest Hadoop users (LinkedIn, Yahoo,
> > Microsoft) that I am aware of are moving up from 2.7/2.8 to 2.10, and
> that
> > 2.8.5 (the last version in the 2.8 line) was released in Sep 2018, more
> > than a year ago. It doesn't look like the community has the desire to
> > continue the 2.8 line.
> >
> > I think it is a little extreme to remove hadoop2 profile, given that
> Hadoop
> > 2.9 and 2.10 are still active and I expect Hadoop 2 to stay around for at
> > least 2 years out.
> >
> > On Thu, Dec 26, 2019 at 8:41 AM 张铎(Duo Zhang) 
> > wrote:
> >
> > > Hadoop 2.8.x has been removed from the download page of hadoop so I
> think
> > > it is time to bump the hadoop dependency to 2.9.x, on master an
> branch-2.
> > >
> > > And the hadoop community is going to make 2.10.x the last minor release
> > > line for 2.x
> > >
> > >
> > >
> >
> https://lists.apache.org/thread.html/cab84265d632b90d66dcd1ad957a7439a2c76a987c7e62feafb4812e%40%3Ccommon-dev.hadoop.apache.org%3E
> > >
> > >
> > > I think this is a sign that the community is moving forward to 3.x. So
> I
> > > propose we make the master branch hadoop3 only, This requires changing
> > the
> > > pom a bit to active hadoop3 profile by default and remove the hadoop2
> > > profile.
> > >
> > > Thoughts? Thanks.
> > >
> >
>


Re: [DISCUSS] Cut branch-2.3 and EOL branch-2.1

2020-01-06 Thread Nick Dimiduk
Filed https://issues.apache.org/jira/browse/HBASE-23652 .

On Sat, Jan 4, 2020 at 7:00 AM 张铎(Duo Zhang)  wrote:

> Nick Dimiduk  于2020年1月3日周五 下午11:47写道:
>
> > On Fri, Jan 3, 2020 at 00:42 Duo Zhang  wrote:
> >
> > > After "HBASE-23326 Implement a ProcedureStore which stores procedures
> in
> > a
> > > HRegion" and related issues are resolved, I think the last flaky part
> of
> > > the proc-v2 framework has been resolved.
> >
> >
> > 23326 is quite a present for the new year!
> >
> > So the plan here is to cut branch-2.3, and start to stablize it. And once
> > > 2.3.0 is out, I will do a final 2.1.x release and mark branch-2.1 as
> EOL.
> >
> >
> > +1
> >
> > Suggestions are welcomed.
> >
> >
> > My primary suggestion is we need to make sure the rolling upgrade from
> 2.1
> > to 2.3 is as “turnkey” as possible. We already have the change to
> > persistent procedures from 2.2, which adds a hiccup. Let’s try to make
> this
> > more smooth for the under-staffed operations houses out there.
> >
> Oh, thanks for your reminding. We do have a problem for upgrading, we
> should check whether there are unsupported procedures before migrating the
> procedure data on 2.3, as we want user to restart the 2.1 master first to
> finish the procedures, but as we have already migrated the procedure data,
> there is no way for the user to downgrade...
>
> Let file an issue for this...
>
> >
> > Oh, we need a release manager for branch-2.3. Any
> > > volunteers?
> >
> >
> > I volunteer to RM 2.3; I don’t mind taking another tour in the role.
> > Someone else speak up though if you feel you have a stronger claim.
> >
> > Thanks,
> > Nick
> >
> > >
> >
>


Re: [DISCUSS] Guidelines for Code cleanup JIRAs

2020-01-10 Thread Nick Dimiduk
Thanks for being this up Andrew.

I am also +1 for code cleanup. I agree that such efforts must hit the fork
branches of each release line, thus: master, branch-2, branch1.

I’m -0 on taking such commits to release branches. These code lines are
should be relatively static, only receiving bug fixes for their lifetime.
Cleanup under src/test being a notable exception to this point.

-n

On Fri, Jan 10, 2020 at 13:08 Sean Busbey  wrote:

> the link didn't work for me. here's another:
>
> https://s.apache.org/5yvfi
>
> Generally, I like this as an approach. I really value the clean up work,
> but cleanup / bug fixes that don't make it into earlier release lines then
> make my job as an RM who does backports more difficult especially when they
> touch a lot of code. I know we have too many branches, but just handling
> the major release lines means only 2 backports at the moment.
>
> I'd be happy with folks just noting a reason on the jira why something
> couldn't go back to branch-2 or branch-1 (e.g. when something requires
> JDK8).
>
> On Thu, Jan 9, 2020 at 2:12 PM Andrew Purtell  wrote:
>
> > Over on the Hadoop dev lists Eric Payne sent a great summary of
> discussions
> > that community has had on the tradeoffs involved with code cleanup
> issues,
> > and also provided an excellent set of recommendations.
> >
> > See the thread here: https://s.apache.org/fn5al
> >
> > I will include the top post below. I endorse it in its entirety as a
> > starting point for discussion in our community as well.
> >
> > >>>
> > There was some discussion on
> > https://issues.apache.org/jira/browse/YARN-9052
> > about concerns surrounding the costs/benefits of code cleanup JIRAs. This
> > email is to get the discussion going within a wider audience.
> >
> > The positive points for code cleanup JIRAs:
> > - Clean up tech debt
> > - Make code more readable
> > - Make code more maintainable
> > - Make code more performant
> >
> > The concerns regarding code cleanup JIRAs are as follows:
> > - If the changes only go into trunk, then contributors and committers
> > trying to
> >  backport to prior releases will have to create and test multiple patch
> > versions.
> > - Some have voiced concerns that code cleanup JIRAs may not be tested as
> >   thoroughly as features and bug fixes because functionality is not
> > supposed to
> >   change.
> > - Any patches awaiting review that are touching the same code will have
> to
> > be
> >   redone, re-tested, and re-reviewed.
> > - JIRAs that are opened for code cleanup and not worked on right away
> tend
> > to
> >   clutter up the JIRA space.
> >
> > Here are my opinions:
> > - Code changes of any kind force a non-trivial amount of overhead for
> other
> >   developers. For code cleanup JIRAs, sometimes the usability,
> > maintainability,
> >   and performance is worth the overhead.
> > - Before opening any JIRA, please always consider whether or not the
> added
> >   usability will outweigh the added pain you are causing other
> developers.
> > - If you believe the benefits outweigh the costs, please backport the
> > changes
> >   yourself to all active lines.
> > - Please don't run code analysis tools and then open many JIRAs that
> > document
> >   those findings. That activity does not put any thought into this
> > cost-benefit
> >   analysis.
> > <<<
> >
> > My preference is to port all the way back to at least branch-1. Those
> > interested in branch-1 maintenance and code lines derived from it, like
> > 1.3, 1.4, 1.5, and soon 1.6, can decide what to do once it lands in
> > branch-1, but we at least need the branch-1 backport as a starting point
> > addressing some of the major prerequisites: Hadoop 2 support, Java 7
> source
> > level, etc.
> >
> > --
> > Best regards,
> > Andrew
> >
> > Words like orphans lost among the crosstalk, meaning torn from truth's
> > decrepit hands
> >- A23, Crosstalk
> >
>


Re: [DISCUSS] Guidelines for Code cleanup JIRAs

2020-01-10 Thread Nick Dimiduk
On Fri, Jan 10, 2020 at 16:53 Andrew Purtell  wrote:

>
> "Please don't run code analysis tools and then open many JIRAs that
> document those findings. That activity does not put any thought into this
> cost-benefit analysis."
>
> On this latter point, it also includes trivial checkstyle nits and low
> priority findbugs findings.


I’m curious why you call out these tools specifically? We have both
integrated into our build. In the case of checkstyle, we control the
configuration and there are plugins for both Eclipse and IntelliJ — there’s
really no excuse for contributors ignoring the warnings they produce. If we
don’t like some class of warning, we should adjust the rule, not ignore the
check failure.

I agree there’s no real value in someone coming along, running a tool, and
opening a bunch of tickets. On the other hand, I very much appreciate Jan’s
recent efforts to address the checkstyle issues, module by module.

On Fri, Jan 10, 2020 at 4:45 PM Nick Dimiduk  wrote:
>
> > Thanks for being this up Andrew.
> >
> > I am also +1 for code cleanup. I agree that such efforts must hit the
> fork
> > branches of each release line, thus: master, branch-2, branch1.
> >
> > I’m -0 on taking such commits to release branches. These code lines are
> > should be relatively static, only receiving bug fixes for their lifetime.
> > Cleanup under src/test being a notable exception to this point.
> >
> > -n
> >
> > On Fri, Jan 10, 2020 at 13:08 Sean Busbey  wrote:
> >
> > > the link didn't work for me. here's another:
> > >
> > > https://s.apache.org/5yvfi
> > >
> > > Generally, I like this as an approach. I really value the clean up
> work,
> > > but cleanup / bug fixes that don't make it into earlier release lines
> > then
> > > make my job as an RM who does backports more difficult especially when
> > they
> > > touch a lot of code. I know we have too many branches, but just
> handling
> > > the major release lines means only 2 backports at the moment.
> > >
> > > I'd be happy with folks just noting a reason on the jira why something
> > > couldn't go back to branch-2 or branch-1 (e.g. when something requires
> > > JDK8).
> > >
> > > On Thu, Jan 9, 2020 at 2:12 PM Andrew Purtell 
> > wrote:
> > >
> > > > Over on the Hadoop dev lists Eric Payne sent a great summary of
> > > discussions
> > > > that community has had on the tradeoffs involved with code cleanup
> > > issues,
> > > > and also provided an excellent set of recommendations.
> > > >
> > > > See the thread here: https://s.apache.org/fn5al
> > > >
> > > > I will include the top post below. I endorse it in its entirety as a
> > > > starting point for discussion in our community as well.
> > > >
> > > > >>>
> > > > There was some discussion on
> > > > https://issues.apache.org/jira/browse/YARN-9052
> > > > about concerns surrounding the costs/benefits of code cleanup JIRAs.
> > This
> > > > email is to get the discussion going within a wider audience.
> > > >
> > > > The positive points for code cleanup JIRAs:
> > > > - Clean up tech debt
> > > > - Make code more readable
> > > > - Make code more maintainable
> > > > - Make code more performant
> > > >
> > > > The concerns regarding code cleanup JIRAs are as follows:
> > > > - If the changes only go into trunk, then contributors and committers
> > > > trying to
> > > >  backport to prior releases will have to create and test multiple
> patch
> > > > versions.
> > > > - Some have voiced concerns that code cleanup JIRAs may not be tested
> > as
> > > >   thoroughly as features and bug fixes because functionality is not
> > > > supposed to
> > > >   change.
> > > > - Any patches awaiting review that are touching the same code will
> have
> > > to
> > > > be
> > > >   redone, re-tested, and re-reviewed.
> > > > - JIRAs that are opened for code cleanup and not worked on right away
> > > tend
> > > > to
> > > >   clutter up the JIRA space.
> > > >
> > > > Here are my opinions:
> > > > - Code changes of any kind force a non-trivial amount of overhead for
> > > other
> > > >   developers. For code cleanup JIRAs, sometimes the usability,
> > > > maintaina

Working toward 2.3.0

2020-01-27 Thread Nick Dimiduk
Heya,

I wanted to give an update on progress toward the first 2.3.0 RCs.

Jira has been cleaned up, with a bunch of unlikely tickets kicked out.
FixVersions have also been audited to ensure they match what's in git. On
first glance, this is shaping up to be a fat release, with over 800 issues
in this version [0]. However, it's not quite as scary as all that. I was
able to assemble a report of only the issues that are new to this version,
tickets that have not seen a previous release [1]. This list is shorter,
only about 175 currently, and when it's all said and done, I expect we'll
land in somewhere on the order of 200.

There are yet a few major tickets to land before the release branch is cut.
The two bigger feature work items that I have my eyes on are HBASE-18095
and HBASE-22978. I've also marked HBASE-22972 as a blocker. If you have
other new feature work you're hoping to get into 2.3.0, please set the
fixVersion and ping me in the comments or your PR. If you have some spare
cycles, please take a pass through our open PR's [2]; there's quite a bit
of low-hanging fruit in there that could use some attention.

Our test situation is still looking a bit rough [3]. A couple brave souls
have started working through the flakies [4], but there's more work to be
done.

That's where we're at. With this amount of work, I expect that puts the
first RC sometime the week of Feb 10 or Feb 17. If folks want binaries they
can start to abuse before then, let me know and I'll see what can be done.

That's what I have for today. Please speak up with questions and concerns.
Preemptive thank you to everyone who's been pushing to bring this release
together. Your work has not gone unnoticed.

Thanks,
Nick

[0]: https://issues.apache.org/jira/projects/HBASE/versions/12344893
[1]: http://home.apache.org/~ndimiduk/new_for_branch-2.csv
[2]: https://github.com/apache/hbase/pulls
[3]:
https://builds.apache.org/view/H-L/view/HBase/job/HBase%20Nightly/job/branch-2/
[4]:
https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests/job/branch-2/lastSuccessfulBuild/artifact/dashboard.html


[DISCUSS] Hadoop dependency versions for 2.3

2020-02-03 Thread Nick Dimiduk
Hello,

I'd like to discuss the Hadoop versions we'll target for the 2.3 release
line. The topics up for discussion are: (1) what do we set as the
dependency versions in our poms as build defaults on each profile; and (2)
what is the breadth of testing to which we are able to commit for the
purposes of our compatibility matrix?

Currently our pom has:

2.8.5
3.1.2

Regarding our Hadoop 2 dependency, it seems the Hadoop project no longer
lists 2.8.x on their release page [0], though my searches have not
materialized an EOL announcement. There is a thread [1] suggesting that
there will be just one more release on 2.8 after 2.8.5, dates from
September 2019. Is this reason enough to bump forward our Hadoop-2
dependency, and if so, to what version? 2.9.2 seems a likely candidate,
however it looks like Duo's inquiry [2] as to the liveliness of that
release line has gone unanswered. 2.10.0 was release fairly recently, but
I've not seen anything to indicate that should be considered a stable
release. At this point, I'm prone to simply not touch it for 2.3.

Regarding our Hadoop 3 dependency, 3.1.2 is the latest version on that
release line. Since then, we've seen the advent of 3.2.x. I can find no
indication of the 3.2.x series being labeled as "not production ready."
There's talk of Hadoop 3.3, which will supposedly bring JDK11 support, but
I don't think it matches our timelines for HBase 2.3. Is there a reason to
advance our Hadoop 3 dependency? Likewise, at this point, I'm prone to
simply not touch it for 2.3.

Thoughts?

Thanks,
Nick

[0]: https://hadoop.apache.org/releases.html
[1]:
https://lists.apache.org/thread.html/ac7c53cf6f41d440d7ca120b2ea41fc5dc0f36041d4c03ee30d4e6d3%40%3Ccommon-dev.hadoop.apache.org%3E
[2]:
https://lists.apache.org/thread.html/0b1b5d80e6481796635c91e409dab0111387db3012d43357352108ec%40%3Ccommon-dev.hadoop.apache.org%3E


Re: [DISCUSS] HBASE-18095 - Remove zookeeper dependency in client codepaths

2020-02-04 Thread Nick Dimiduk
Thanks for the summary, and all your efforts on this enhancement, Bharath.

Let me highlight here on point re: this architecture. Because the Master is
use in place of ZooKeeper for clients to find their way around the cluster,
this changes our operational requirements on the Master. With this feature
enabled, operators will need to keep at least one Master up and responding
to RPC's in order for clients to locate meta. As with the existing ZK-based
meta location discovery, the location is cached by the client so as to not
make excessive calls. There are tests in place that ensure this
functionality behaves appropriately even if Master leader election somehow
fails.

As for me, I'm +1 on merging this implementation to master and branch-2.

Thanks,
Nick

On Tue, Feb 4, 2020 at 8:37 AM Bharath Vissapragada 
wrote:

> Hello everyone,
>
> I'd like to kickoff a discuss thread on dev@ to see what folks think about
> merging the feature branch for HBASE-18095
> <https://issues.apache.org/jira/browse/HBASE-18095> into the master. For
> those of you who aren't following this work, over the last few months, a
> lot of effort went into a feature branch
> <
> https://github.com/apache/hbase/tree/HBASE-18095/client-locate-meta-no-zookeeper
> >
> to
> remove the ZK dependency in the client.
>
> *Please refer to the design doc
> <
> https://docs.google.com/document/d/1JAJdM7eUxg5b417f0xWS4NztKCx1f2b6wZrudPtiXF4/edit
> >
> attached to the parent jira and go through the subtasks for all the
> technical details and design considerations*.
>
> *TL;DR*: With this feature, the client connection implementation *does not*
> need access to zookeeper to fetch the connection metadata. Instead, a
> predefined set of master end points in the configuration are used by
> clients to fetch the required metadata.
>
> This new feature is *enabled by default on the feature branch* and passes
> the entire nightly test suite (modulo some known flakes not specific to the
> branch). At this point, I'm not aware of any performance concerns / feature
> gaps compared to original default implementation. The original registry
> implementation is still retained and can be used by setting the following
> client configuration. This kill switch gives the users more flexibility
> since they have a fallback incase they run into any issues.
>
>  
>  
> hbase.client.registry.impl
> org.apache.hadoop.hbase.client.ZKConnectionRegistry
>  
>
> This work is also slated to go into the upcoming releases* 2.3.0* and
> *1.6.0*. However, it will be *disabled by default*. Having this work back
> ported to those branches enables users to try it out in their environments
> and report any feedback.
>
> Please speak up (respond to this email) if there are any objections to
> merging this work in the master branch.
>
> Many thanks to Nick Dimiduk, Andrew Purtell and Michael Stack for their
> invaluable feedback throughout this work.
>
> - Bharath
>


Re: [VOTE] The first HBase 2.2.3 release candidate (RC1) is available

2020-02-04 Thread Nick Dimiduk
Heya Mighty Guanghao. Looks like the 2.2.3 release version in Jira wasn't
closed out. Probably needs a bit of love to make sure there's no stray
issues that wondered in since the release tag.

On Wed, Jan 15, 2020 at 12:10 AM Guanghao Zhang  wrote:

> With 4 binding +1s, the vote passes. Let me push out the release.
>
> Guanghao Zhang  于2020年1月15日周三 下午4:07写道:
>
> > +1 from me (binding)
> >
> > hbase-2.2.3-bin.tar.gz (openjdk 1.8.0_202)
> > - Verified sha512sum: ok
> > - Start HBase in standalone mode: ok
> > - Verified with shell, create/disable/enable/drop/get/put/scan/delete: ok
> > - Checked master/regionserver/table/region Web UI: ok
> >
> > hbase-2.2.3-src.tar.gz (openjdk 1.8.0_202)
> > - Verified sha512sum: ok
> > - Build tarball: ok
> > - Start HBase in standalone mode: ok
> > - Verified with shell, create/disable/enable/drop/get/put/scan/delete: ok
> > - Checked master/regionserver/table/region Web UI: ok
> >
> > I ran TestRegionReplicas 3 times and passed. So this is a flaky test,
> too.
> >
> > Jan Hentschel  于2020年1月14日周二 上午2:48写道:
> >
> >> +1 (binding)
> >>
> >> * Signature: ok
> >> * Checksum : ok
> >> * Rat check (1.8.0_202-ea): ok
> >>  - mvn clean apache-rat:check
> >> * Built from source (1.8.0_202-ea): ok
> >>  - mvn clean install -DskipTests
> >> * Unit tests pass (1.8.0_202-ea): ok
> >>  - mvn package -P runSmallTests
> >>
> >> From: Wellington Chevreuil 
> >> Reply-To: "dev@hbase.apache.org" 
> >> Date: Monday, January 13, 2020 at 7:35 PM
> >> To: dev 
> >> Subject: Re: [VOTE] The first HBase 2.2.3 release candidate (RC1) is
> >> available
> >>
> >> +1 (binding)
> >>
> >> * Signature: ok
> >> * Checksum : ok
> >> * Rat check (1.8.0_222): ok
> >> * Built from source (1.8.0_222): ok
> >>  - mvn clean install -DskipTests
> >> * Unit tests pass (1.8.0_222): ok
> >>  - mvn package -P runSmallTests
> >>
> >> Deployed a pseudo-distributed local install:
> >> - hbase shell create table, create namespace, put, scan, delete, get,
> >> list,
> >> list_namaspace: ok
> >> - hbase ltt 2.5M+ rows of 5KB each: ok
> >>
> >>
> >> Em sáb., 11 de jan. de 2020 às 00:48, Andrew Purtell <
> apurt...@apache.org
> >> >
> >> escreveu:
> >>
> >> +1
> >>
> >> * Signature: ok
> >> * Checksum : ok
> >> * Rat check (1.8.0_232): ok
> >> - mvn clean apache-rat:check
> >> * Built from source (1.8.0_232): ok
> >> - mvn clean install -DskipTests
> >> * Unit tests pass (1.8.0_232): one consistent failure (hang) -
> >> TestRegionReplicas - and several flakes (see below)
> >> - mvn package -P runAllTests
> >>
> >> Errors:
> >>
> >> This test consistently hangs. OS is MacOS Mojave (10.14.6). Java is
> >> OpenJDK
> >> 1.8.0_232-b18 (Zulu 8.42.0.21-CA-macosx). See stacktrace below.
> >>
> >> org.apache.hadoop.hbase.regionserver.TestRegionReplicas.null
> >>Run 1:
> >> TestRegionReplicas.testVerifySecondaryAbilityToReadWithOnFiles:476
> >> » TestTimedOut
> >>Run 2: TestRegionReplicas »  Appears to be stuck in thread
> >> Default-IPC-NioEventLoopGr...
> >>
> >> Flakes:
> >>
> >>
> >>
> >>
> org.apache.hadoop.hbase.client.TestFromClientSide3.testScanAfterDeletingSpecifiedRowV2
> >>Run 1: TestFromClientSide3.testScanAfterDeletingSpecifiedRowV2:253
> >> expected:<3> but was:<2>
> >>Run 2: PASS
> >>
> >> org.apache.hadoop.hbase.master.assignment.TestRegionMoveAndAbandon.test
> >>Run 1: TestRegionMoveAndAbandon.test:120 » Runtime
> >> org.apache.hadoop.hbase.client.Ret...
> >>Run 2: TestRegionMoveAndAbandon.test:120 » Runtime
> >> org.apache.hadoop.hbase.client.Ret...
> >>Run 3: PASS
> >>
> >>
> org.apache.hadoop.hbase.master.cleaner.TestLogsCleaner.testZooKeeperNormal
> >>Run 1: TestLogsCleaner.testZooKeeperNormal:281
> >>Run 2: PASS
> >>
> >>
> >>
> >>
> org.apache.hadoop.hbase.replication.regionserver.TestReplicator.testReplicatorWithErrors
> >>Run 1: TestReplicator.testReplicatorWithErrors:158 We did not
> replicate
> >> enough rows expected:<10> but was:<9>
> >>Run 2: PASS
> >>
> >>
> >>
> >>
> org.apache.hadoop.hbase.util.TestFromClientSide3WoUnsafe.testScanAfterDeletingSpecifiedRowV2
> >> Run 1:
> >>
> >>
> >>
> TestFromClientSide3WoUnsafe>TestFromClientSide3.testScanAfterDeletingSpecifiedRowV2:253
> >> expected:<3> but was:<2>
> >> Run 2: PASS
> >>
> >> Stack trace of hanging test:
> >>
> >> "Time-limited test" #20 daemon prio=5 os_prio=31 tid=0x7fc1c13b9000
> >> nid=0x6503 waiting on condition [0x7000117d5000]
> >> java.lang.Thread.State: TIMED_WAITING (sleeping)
> >>  at java.lang.Thread.sleep(Native Method)
> >>  at
> >>
> >>
> >>
> org.apache.hadoop.hbase.regionserver.HRegionServer.sleep(HRegionServer.java:1503)
> >>  at
> >>
> >>
> >>
> org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionServerStatusStub(HRegionServer.java:2614)
> >>  - locked <0x000788051fe0> (a
> >> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServ

[ANNOUNCE] New HBase committer Bharath Vissapragada

2020-02-05 Thread Nick Dimiduk
On behalf of the Apache HBase PMC I am pleased to announce that Bharath
Vissapragada has accepted the PMC's invitation to become a commiter on the
project. We appreciate all of Bharath's generous contributions thus far and
look forward to his continued involvement.

Allow me to be the first to congratulate and welcome Bharath into his new
role!

Thanks,
Nick


Re: [DISCUSS] Stop flagging Read Replication from next release

2020-02-06 Thread Nick Dimiduk
Hi Szabolcs,

Looks like that note was dropped in via HBASE-20830, about 2 years back.
>From what we've seen on our clusters, kicking the tires with branch-2.2 and
branch-2, that was indeed the case. Based on that recent experience and
that of others', there's a number of fixes for Procedures and hbck2 that
better handle read replica regions. Try this git incantation [0] and look
for issues related to "unknown servers" and "RIT".

I'm not quite comfortable with removing the warning yet (more testing to be
done), but I'm hoping to get to that point, at least for our production
workloads, if not for 2.3.0, within the early releases of 2.3.x.

Maybe someone else who's using and abusing this feature on a branch-2
release could add their experience?

Thanks,
Nick

[0]: git log --oneline branch-2 --
hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/ |
head -n20

On Thu, Feb 6, 2020 at 1:26 AM Szabolcs Bukros
 wrote:

> Hi!
>
> In the current Reference Guide the section "75. Timeline-consistent High
> Available Reads" is flagged as "maybe broken. Use it with caution". I'm not
> familiar with the original reason it was flagged but I have spent a few
> weeks working on this and after a few small fixes it looked stable enough.
> I think we should remove this warning for new 2.2+ releases. Below are some
> details about the fixes and the testing I did.
>
> Fixes:
> - HBASE-23589 FlushDescriptor contains non-matching family/output
> combinations 
> - HBASE-23601 OutputSink.WriterThread exception gets stuck and repeated
> indefinitely 
>
> Testing:
> After the fixes I run IntegrationTestRegionReplicaReplication for testing
> on a 4 machine cluster (3 RS, 30GB heap/RS). I used the default test
> parameters, only increased read_delay_ms to 6. The longest
> uninterrupted run I tried was 8 hours and I encountered no issues. Even
> adding in the chaos monkeys (slowDeterministic) hasn't revealed any new
> correctness issues with the feature.
>
> Next steps:
> - Further testing. I realize IntegrationTestRegionReplicaReplication
> provides a very uniform, unrealistic load, using different data could be
> interesting. If someone would find the time to run a few tests or propose
> some scenarios I would be grateful.
> - I was thinking of providing a cleaner flush logic on replication side,
> but my proposal might have too much overhead and the current logic while
> having issues works after the previous fixes. The proposal can be found in
> HBASE-23591, any feedback would be welcomed.
>
> Thoughts?
>


Re: [DISCUSS] Merge "HBASE-22514 Move rsgroup feature into core of HBase" back to master

2020-02-13 Thread Nick Dimiduk
Hi Duo,

I'll take a look. Are you wanting to get this in for 2.3.0?

Thanks,
Nick

On Tue, Feb 11, 2020 at 6:59 PM Duo Zhang  wrote:

> All the sub tasks have been done. The remaining things are release note and
> documentation update. Will do them soon.
>
> Here I want to hear more voices before I started the final vote, so I
> created a huge pull request
>
> https://github.com/apache/hbase/pull/1165
>
> PTAL.
>
> Thanks.
>


[DISCUSS] Defining a process for merging and back-porting feature branches

2020-02-19 Thread Nick Dimiduk
Hello,

We have had a couple feature branches in flight recently. I would like to
review our project policy regarding how we account for the merging of these
feature branches to master and other release line branches. There has been
some discussion on this topic around HBASE-18095, but I want to bring it to
light outside of that context. Whatever we decide here, we should write up
and include in the book.

By way of process, my preference is that when we merge a feature branch, we
retain all of the individual commits (one-to-one with Jira sub-tasks).
Mechanically this means something like the following:
  (1) squash together all commits that correspond with a single Jira
sub-task, making the history into one commit for one sub-task.
  (2) rebase the feature branch onto master;
  (3) create a PR from the feature branch into master;
  (4) use the "rebase and merge" option when merging the PR;
  (5) update the fixVersion of the umbrella and all sub-tasks to the
version of master;
  (6) repeat steps 2-5 for each back-port.

My reason for preferring preservation of sub-commit history is that, in the
event of follow-on addendums and sub-task (something we have a habit of
doing), its easy for release line maintainers to account for which of those
follow-ons have been applied to their branches of interest. If the "squash
and merge" option is chosen, it becomes much more difficult for a release
manager (or indeed, curious historians) to identify exactly which Jira
issues are present in the history.

My reason for preferring PRs for merging feature branches (and back-ports)
over a developer pushing manually is that it gives the maintainer an
opportunity to benefit from the pre-commit robot, and
back-port-branch-specific discussion to occur in the context of the code
changes proposed.

There are certainly other ways of going about this. I'm curious what others
think of the above.

Thanks,
Nick


Running HBase on Hadoop 3.1, was Re: [ANNOUNCE] Apache HBase 2.1.9 is now available for download

2020-02-20 Thread Nick Dimiduk
changed subject, dropped dev

Hello Mich,

The HBase binary distribution includes all Hadoop client jars necessary for
HBase to function on top of HDFS. The version of those Hadoop jars is that
of Hadoop 2.8.5. Duo is saying that the Hadoop 2.8.5 client works against a
HDFS 3.1.x cluster. Thus, this binary release of HBase is expected to work
with HDFS 3.1.

Alternatively, you can choose to recompile HBase using Hadoop 3.1.x as the
dependency. In that case, you'll need to retrieve the source, establish a
build environment, and perform the build yourself -- the HBase project does
not distribute a binary artifact compiled against Hadoop 3.

Thanks,
Nick

On Thu, Feb 20, 2020 at 3:20 AM Mich Talebzadeh 
wrote:

> Thank you.
>
> In your points below
>
> If you want to run HBase on top of a 3.1.x HDFS cluster, you can just use
> the binaries, hadoop 2.8.5 client can communicate with 3.1.x server.
>
> If you want HBase to use hadoop 3.1.x, then you need to build the binaries
> by yourself, use the hadoop 3 profile.
>
> Could you please clarify (since Hbase can work with Hadoop), which version
> of new release has been tested against which version of Hadoop.
>
> Also it is not very clear which binaries you are referring to? "Hadoop
> 2.8.5 client can communicate with "Hadoop" 3.1 server. So basically
> download Hadoop 2.8.5 client for Hbase use?
>
>
> Regards,
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn *
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Thu, 20 Feb 2020 at 09:45, 张铎(Duo Zhang)  wrote:
>
> > If you want to run HBase on top of a 3.1.x HDFS cluster, you can just use
> > the binaries, hadoop 2.8.5 client can communicate with 3.1.x server.
> >
> > If you want HBase to use hadoop 3.1.x, then you need to build the
> binaries
> > by yourself, use the hadoop 3 profile.
> >
> > Mich Talebzadeh  于2020年2月20日周四 下午5:15写道:
> >
> > > Hi,
> > >
> > > Thanks.
> > >
> > > Does this version of Hbase work with Hadoop 3.1? I am still stuck with
> > > Hbase 1.2.7
> > >
> > > Hadoop 3.1.0
> > > Source code repository https://github.com/apache/hadoop -r
> > > 16b70619a24cdcf5d3b0fcf4b58ca77238ccbe6d
> > > Compiled by centos on 2018-03-30T00:00Z
> > > Compiled with protoc 2.5.0
> > >
> > > Regards,
> > >
> > > Dr Mich Talebzadeh
> > >
> > >
> > >
> > > LinkedIn *
> > >
> >
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > > <
> > >
> >
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > > >*
> > >
> > >
> > >
> > > http://talebzadehmich.wordpress.com
> > >
> > >
> > > *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any
> > > loss, damage or destruction of data or any other property which may
> arise
> > > from relying on this email's technical content is explicitly
> disclaimed.
> > > The author will in no case be liable for any monetary damages arising
> > from
> > > such loss, damage or destruction.
> > >
> > >
> > >
> > >
> > > On Thu, 20 Feb 2020 at 08:48, Duo Zhang  wrote:
> > >
> > > > The HBase team is happy to announce the immediate availability of
> > Apache
> > > > HBase 2.1.9.
> > > >
> > > > Download from https://hbase.apache.org/downloads.html
> > > >
> > > > Apache HBase is an open-source, distributed, versioned,
> non-relational
> > > > database. Apache HBase gives you low latency random access to
> billions
> > of
> > > > rows with millions of columns atop non-specialized hardware. To learn
> > > more
> > > > about HBase, see https://hbase.apache.org/.
> > > >
> > > > HBase 2.1.9 is the latest release of the HBase 2.1 line, continuing
> on
> > > the
> > > > theme of bringing a stable, reliable database to the Apache Big Data
> > > > ecosystem and beyond. 2.1.9 includes ~62 bug and improvement fixes
> done
> > > > since the 2.1.8.
> > > >
> > > > For instructions on verifying ASF release downloads, please see
> > > >
> > > > https://www.apache.org/dyn/closer.cgi#verify
> > > >
> > > > Project member signature keys can be found at
> > > >
> > > > https://www.apache.org/dist/hbase/KEYS
> > > >
> > > > Thanks to all the contributors who made this release possible!
> > > >
> > > > Best,
> > > > The HBase Dev Team
> > > >
> > >
> >
>


[DISCUSS] Drop support for code contribution via Jira attached patch

2020-02-21 Thread Nick Dimiduk
Heya, and happy Friday.

I would like to propose that we drop support for receiving contributions by
way of attaching a patch file to a Jira issue. From my perspective, in the
face of modern interfaces for PR-style review, this is an "archaic" form of
contribution that is "actively harmful" to project health. I'm being
intentionally divisive, but here's my argument. The "state of the art,"
"modern" "best practices" revolve around a PR (GitHub/GitLab/Gerrit/&c.)
-style review process that enjoys an intentionally designed user experience
and has many points of friction reduced or removed entirely.

When a patch arrives via Jira attachment, it passes through a process that
suffers from a higher level of friction, something that actively
discourages the level of code review that it could have received via PR. I
believe that reduced friction results is a more thorough, thoughtful review
and a review that's better able to handle large change-sets, vs. the
attached patch process. Specific friction points include:
 * applying a patch for local evaluation is more tedious than checking out
a PR branch.
 * each comment requires the reviewer to provide their own context
(reference line numbers, copy-paste code snippets, &c) before saying
anything at all.
 * threaded discussion around individual comments is impossibly difficult
to manage for both participants and casual observers.
 * a super-human level of attention to detail is required on the parts of
both contributor and reviewer to ensure that all review comments have been
addressed.
 * syntax highlighting (while rudimentary vs. IDE-based evaluation) makes
patches easier to read and digest.

I claim "actively harmful" compared to the alternative because the above
minor friction points, taken together, discourages the higher quality
reviews that are possible by way of (1) the git-based interface to the
contributed content and (2) a commenting system that supports contextual,
threaded, and resolvable comments.

The primary counter argument I've come up with is based around user access.
It's possible that there's a contributor who has an Apache JIRA ID but not
a GitHub ID, who is unwilling to make an account on the non-Apache service.
Not accepting issue-attached patches means we exclude that contributor from
our community. However, I suspect that the number of active and potential
contributors who falls into that bucket is at or approaching zero. I
suspect that the world of potential contributors who have a GitHub ID but
refuse to make an Apache JIRA ID is actually far greater.

Thus I propose we discontinue accepting patches attached to Jira issues;
our contributor guide would exclusively ask for a PR; we can shut down the
pre-commit robot from scanning Jira.

Thanks in advance for your thoughtful participation in this discussion.
Nick


Re: [DISCUSS] Defining a process for merging and back-porting feature branches

2020-02-21 Thread Nick Dimiduk
Whatever we write up in our contributors section is ultimately a
recommended workflow, not fixed in stone. Until such time as we automate
entirely the contribution process, there will always be an element of human
discretion involved.

That said, I generally echo Andrew's concern re: alert fatigue.

On Fri, Feb 21, 2020 at 9:25 AM Wellington Chevreuil <
wellington.chevre...@gmail.com> wrote:

> >
> > It could work of course but has downsides. Email explosion. How do I as
> > committer know what order to apply them in? Confusion as random reviewers
> > see some of the changes in some of the PRs but miss other changes in
> others.
> >
> Was not suggesting it to be the norm, but something to keep as an
> alternative approach. The commit dependency can be tracked via jira, but
> yeah, a bit harder then if all required commits are in same PR.
>
> Em sex., 21 de fev. de 2020 às 16:55, Andrew Purtell <
> andrew.purt...@gmail.com> escreveu:
>
> > I don’t think a PR per JIRA per backport is what we want. It could work
> of
> > course but has downsides. Email explosion. How do I as committer know
> what
> > order to apply them in? Confusion as random reviewers see some of the
> > changes in some of the PRs but miss other changes in others.
> >
> >
> > > On Feb 21, 2020, at 4:13 AM, Wellington Chevreuil <
> > wellington.chevre...@gmail.com> wrote:
> > >
> > > 
> > >>
> > >>
> > >> we retain all of the individual commits (one-to-one with Jira
> > sub-tasks).
> > >> Mechanically this means something like the following:
> > >>  (1) squash together all commits that correspond with a single Jira
> > >> sub-task, making the history into one commit for one sub-task.
> > >>
> > >
> > > How about also add the flexibility to do individual PRs for each jira
> > > sub-task? That would still allow for tracking a commit per Jira. It
> > > wouldn't be as easier for tracking backports, but there may be
> scenarios
> > > where would make sense to commit individual jira sub-tasks ahead of the
> > > whole, completed feature.
> > >
> > >> Em qua., 19 de fev. de 2020 às 20:05, Andrew Purtell <
> > >> andrew.purt...@gmail.com> escreveu:
> > >>
> > >> The committer guide in the book says squash merges are always
> preferred.
> > >> When I was faced with a backport PR I referred to this and opted for
> > >> squash-and-merge rather than rebase-and-merge as consequence. Let’s
> > update
> > >> that guidance with the extra process detail for backports of features
> > >> spanning multiple commits/JIRAs.
> > >>
> > >>>> On Feb 19, 2020, at 8:25 AM, Sean Busbey  wrote:
> > >>>
> > >>> So long as the backport PRs are lazy consensus instead of the RTC
> that
> > >>> a PR generally implies (and the original branch went through) then
> > >>> this all reads as in line with my own preferences and what we've
> > >>> mostly done historically.
> > >>>
> > >>>> On Wed, Feb 19, 2020 at 10:08 AM Nick Dimiduk 
> > >> wrote:
> > >>>>
> > >>>> Hello,
> > >>>>
> > >>>> We have had a couple feature branches in flight recently. I would
> like
> > >> to
> > >>>> review our project policy regarding how we account for the merging
> of
> > >> these
> > >>>> feature branches to master and other release line branches. There
> has
> > >> been
> > >>>> some discussion on this topic around HBASE-18095, but I want to
> bring
> > >> it to
> > >>>> light outside of that context. Whatever we decide here, we should
> > write
> > >> up
> > >>>> and include in the book.
> > >>>>
> > >>>> By way of process, my preference is that when we merge a feature
> > >> branch, we
> > >>>> retain all of the individual commits (one-to-one with Jira
> sub-tasks).
> > >>>> Mechanically this means something like the following:
> > >>>> (1) squash together all commits that correspond with a single Jira
> > >>>> sub-task, making the history into one commit for one sub-task.
> > >>>> (2) rebase the feature branch onto master;
> > >>>> (3) create a PR from the feature branch into master;
> > >>>> (4) use the "r

Re: [DISCUSS] Drop support for code contribution via Jira attached patch

2020-02-24 Thread Nick Dimiduk
> The bot is for lots of projects, not only HBase, so we can not shut it
down.

Pardon me Duo. I am proposing we terminate PreCommit-HBASE-Build. The
PreCommit-Admin job appears to be the responsibility of Yetus and is out of
scope of my discussion here.

> There's already debt on that job; Nick has been starting to pay it down
and I'd imagine that's how we got to this thread about just turning it off.

Sean is calling me out here, and it's correct.

I have recently been looking to add JDK11 support to all of our CI jobs
(HBASE-23875 and friends), and Jira-attached patch files is the one that
demands the most work, just to get up to be on par with the rest. See
HBASE-23874 and HBASE-23879 as prerequisites And then today I uncovered
this little gem: it looks like PreCommit-HBASE-Build is broken for all but
whatever is the first attachment file recognized as a patch.
https://issues.apache.org/jira/browse/HBASE-23888

Final nail in the coffin is that it seems the global PreCommit-Admin job is
not actually running on it's intended 10-minute interval. It broke today
and no one noticed until I went looking. When it does run, it's somehow
missing newly attached files. This is evident by way of the output of the
last couple job runs missing Issue/Attachment pairs for the various
attachments on HBASE-23874.

So yes, I do believe that a PR results in a better code review. I'm also
selfishly interested in cutting work that we don't *have* to do, supporting
a process that isn't used by the majority of our contributors.

On Sat, Feb 22, 2020 at 6:52 AM Sean Busbey  wrote:

> There's a dedicated precommit job just for the hbase main repo. We can shut
> it down without impacting other projects or even other hbase repos. There's
> already debt on that job; Nick has been starting to pay it down and I'd
> imagine that's how we got to this thread about just turning it off. For
> example branch 1 handling is currently broken for the jira patch tester but
> not the GitHub PR tester.
>
> I personally don't find it more difficult to evaluate either kind of
> contribution because I rely on Apache Yetus Smart Apply Patch which "just
> works" most of the time given just a jira key to grab either patch or PRs.
>
> However, I definitely agree with the much more difficult commenting
> workflow with just jira. Personally that means if my feedback is going to
> be more than "lgtm" then I ask the contributor to shift to a PR.
>
> I looked at runs of the jira patch precommit bit over Feb and most of them
> look like either duplicate work where it ended up testing a GitHub PR that
> the dedicated PR precommit also tested or an old patch getting retested due
> to age off of the "did I already test this" check of the coordinating job.
>
> There was a contribution from a first time contributor, but afaict they
> were following the ref guide rather than intentionally avoiding github.
>
> I'm also strongly in favor of just shifting to PRs.
>
>
> On Sat, Feb 22, 2020, 05:53 张铎(Duo Zhang)  wrote:
>
> > The bot is for lots of projects, not only HBase, so we can not shut it
> > down.
> >
> > My concern is that, do we really need to shut it down completely? The pre
> > commit job is still fine I think? So just leave it as is? And when
> > sometimes it is broken and needs a lot of effort to maintain, then we can
> > shut it down?
> >
> > Wellington Chevreuil 于2020年2月22日
> > 周六19:12写道:
> >
> > > +1, can only see benefits of using GitHub PRs over attached patches.
> > >
> > >
> > > On Fri, 21 Feb 2020, 21:33 Andrew Purtell, 
> wrote:
> > >
> > > > +1 to the idea, having one contribution workflow instead of two is
> 50%
> > > less
> > > > confusing (or 100% depending how you count it).
> > > >
> > > > > applying a patch for local evaluation is more tedious than checking
> > > out a
> > > > PR branch.
> > > >
> > > > Except it's not necessary to download and apply the patch to evaluate
> > it,
> > > > we have precommit for both workflows. But that brings up something
> you
> > > > didn't mention, which is having two precommit workflows, one for
> JIRA,
> > > one
> > > > for PRs, can be burdensome.
> > > >
> > > >
> > > > On Fri, Feb 21, 2020 at 1:28 PM Nick Dimiduk 
> > > wrote:
> > > >
> > > > > Heya, and happy Friday.
> > > > >
> > > > > I would like to propose that we drop support for receiving
> > > contributions
> > > > by
> >

Re: [DISCUSS] shade Jetty

2020-02-24 Thread Nick Dimiduk
I am in favor of shading Jetty as well, if we can. The caveat being "if we
can".

On Mon, Feb 24, 2020 at 2:11 PM Wei-Chiu Chuang
 wrote:

> Forgot to share a few past attempts:
>
>1. HBASE-18224  >Upgrade
>jetty
>
>
>1. HBASE-19390  >Revert
>to older version of Jetty 9.3
>
>
>1. HBASE-19256  [
>hbase-thirdparty] shade jetty
>
>
> On Mon, Feb 24, 2020 at 2:06 PM Wei-Chiu Chuang 
> wrote:
>
> > Hi,
> >
> > While I work on this jira HBASE-23834
> >  (HBase fails to run
> > on Hadoop 3.3.0/3.2.2/3.1.4 due to jetty version mismatch) and I realized
> > this was attempted before. But it simply doesn't work when you have
> Hadoop
> > and HBase on different Jetty minor versions (9.3 / 9.4) unless Jetty is
> > shaded in HBase (or Hadoop).
> >
> > We should update Jetty in HBase for sure. 9.3 has known security
> > vulnerabilities and not fixed until 9.4.
> >
> > Given that hbase-thirdparty is the standard practice to place
> > thirdparty jars, should we also shade Jetty into hbase-thirdparty?
> >
> >
>


Re: [VOTE] Merge "HBASE-22514 Move rsgroup feature into core of HBase" back to master

2020-03-02 Thread Nick Dimiduk
> And on backporting to branch-2, I think this is all to Nick as he is the
release manager.

I'd like to take it into branch-2 as soon as we can. We've done a major
round of stabilization of branch-2, but that concluded before the winter
break. Now that we've had a handful of major features land, I'm
anticipating another round of stabilization in the coming weeks.

The reasons I can think of to NOT back-port it for 2.3 are the following:
 1. Does not conform with our minor release compatibility "promises".
 2. Introduces significant changes to the assignment manager.
 3. Introduces known complications with JDK11.

Unfortunately I've still not had time to review the meat of the patch. I
expect there to be changes to the AM, but I hope those changes are isolated
and not systemic.

If there are significant reviewer concerns AND someone is up for managing
the overhead, what do you think about maintaining a back port branch that
is regularly rebased onto branch-2? We can start our stabilization efforts
on that feature branch. If it's looking good, the merge will be trivial. If
it's problematic, the feature can receive further attention and we've not
destabilized the pending release.

This is begging another unaddressed question -- are we going to continue
branching for the 2.x minor release lines? Will we release directly from
branch-2, as we have started with branch-1?

Thanks,
Nick

On Mon, Mar 2, 2020 at 8:59 AM Andrew Purtell  wrote:

> +1
>
> Also, +1 to putting this in 2.4. Will give us one of hopefully several
> reasons to keep moving forward. No need to delay the 2.3 release train.
>
> I'd like to try to pick up the backport of this at my employer as part of
> adopting 2.4 in some way, for what it's worth. I think maybe 2.4 for us for
> this reason (rsgroups improvements!!) but also some minor but also minor
> release requiring changes to coprocessor APIs. Will discuss the latter
> point with you soon on a JIRA issue.
>
> On Mon, Mar 2, 2020 at 6:12 AM Sean Busbey  wrote:
>
> > Personally, I'd rather see the branch-2 backport wait for 2.4. the 2.3
> > release has been "close" for a while now and 2.2 came out in June
> > 2019.
> >
> > On Mon, Mar 2, 2020 at 1:16 AM 张铎(Duo Zhang) 
> > wrote:
> > >
> > > Thanks stack, so finally we have 3 binding +1s now.
> > >
> > > Let merge the branch back. And on backporting to branch-2, I think this
> > is
> > > all to Nick as he is the release manager.
> > >
> > > Thanks!
> > >
> > > Stack  于2020年3月2日周一 下午1:40写道:
> > >
> > > > I'm +1 on backport. Will keep an eye on it.
> > > > S
> > > >
> > > > On Sat, Feb 22, 2020 at 5:32 AM Duo Zhang 
> wrote:
> > > >
> > > > > The issue aims to make rs group the first class citizen in HBase,
> > where
> > > > the
> > > > > feature can be enabled through a simple flag, not a complicated
> > > > > coprocessor, and also we can manage it through the Admin interface,
> > while
> > > > > in the old time the only public way is to through the shell
> command,
> > as
> > > > the
> > > > > coprocessor client is marked as IA.Private.
> > > > >
> > > > > This is a simple design doc
> > > > >
> > > > > 
> > > > >
> > > > >
> > > >
> >
> https://docs.google.com/document/d/1SuodZ_uDQQQVJyryRxqp033cgz2aQPJmjIREbbbmB3c/edit?usp=sharing
> > > > >
> > > > > The PR for all the changes
> > > > >
> > > > > https://github.com/apache/hbase/pull/1165
> > > > >
> > > > > And let me copy the release note here
> > > > >
> > > > > Moved rs group feature into core. Use this flag to enable or
> disable
> > it.
> > > > >
> > > > > The coprocessor
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint
> > is
> > > > > deprected, but for compatible, if you want the pre 3.0.0 hbase
> > > > client/shell
> > > > > to communicate with the new hbase cluster, you still need to add
> this
> > > > > coprocessor to master. And if this coprocessor is specified, the
> > above
> > > > flag
> > > > > will be set to true automatically to enable rs group feature.
> > > > >
> > > > > These methods are added to the Admin/AsyncAdmin interface for
> > managing rs
> > > > > groups. See the javadoc of these methods for more details.
> > > > >
> > > > >   void addRSGroup(String groupName) throws IOException;
> > > > >   RSGroupInfo getRSGroup(String groupName) throws IOException;
> > > > >   RSGroupInfo getRSGroup(Address hostPort) throws IOException;
> > > > >   RSGroupInfo getRSGroup(TableName tableName) throws IOException;
> > > > >   List listRSGroups() throws IOException;
> > > > >   List listTablesInRSGroup(String groupName) throws
> > > > IOException;
> > > > >   Pair, List>
> > > > > getConfiguredNamespacesAndTablesInRSGroup(String groupName) throws
> > > > > IOException;
> > > > >   void removeRSGroup(String groupName) throws IOException;
> > > > >   void removeServersFromRSGroup(Set servers) throws
> > IOException;
> > > > >   void moveServersToRSGroup(Set servers, String
> targetGroup)
> > > > > throws IOException;
> > > > >   void setRSGroup(Set tables, String groupName) t

[DISCUSS] Branching strategy for ongoing branch-2 releases

2020-03-03 Thread Nick Dimiduk
Hello,

What is the current thinking around branch-2 releases? It seems branch-1 is
doing away with "a branch per minor release" strategy. I'm curious if we
should be doing the same on branch-2. To summarize the argument for, as I
see it,

The pros:
 - consistent model for both active release lines.
 - encourages more rapid release of new features and the minor releases
that cary them.
 - reduces developer overhead managing back ports.

The cons:
 - difficult to "stabilize" a minor release line.
 - complex "timing" issues when back-porting new features from master.
 - more painful to produce patch releases.

What other bullets did I miss?

I am personally in favor of this approach. I think it provides two major
benefits: increase the velocity of feature releases and raise the quality
bar on commits. My counter-arguments to the cons are:

> difficult to "stabilize" a minor release line

I argue that this is a process issue, and should be addressed before
patches land. We -- the community -- need to help contributors to validate
their changes while they are in the PR process and sit on feature branches,
before the arrive on the release branch. I argue this should happen before
they hit master as well, but it seems that branch has some tech debit that
needs addressed first.

> complex "timing" issues when back-porting new features from master.

We already have this, today, when committers coordinate with a release
manager before merging their patches.

> more painful to produce patch releases.

Okay, I don't think this becomes *that* painful, but there is increased
friction. If the community decides the development branch isn't ready for a
release, it becomes the responsibility of the release manager to create a
temporary branch, cherry-pick back any changes that are necessary, tag, and
go. Once the release tag lands, the temporary branch is discarded. In
practice, I think it's not terribly common for new features (that warrant
increasing the minor version number) to come along back-to-back in close
succession, such that they cannot be timed into the same minor release.

Other concerns?

Thanks,
Nick


Re: [DISCUSS] Branching strategy for ongoing branch-2 releases

2020-03-04 Thread Nick Dimiduk
> What will be the criterion for a new patch release with this model?

>From my previous RM experience, I really liked the process of the RM coming
around once/month, check the branch for activity, and if there's enough
fixes to justify, do a release. In my proposed model, that monthly cadence
doesn't change. The RM would scan Jira for the release line branch
(branch-2, in this case), looking for bug fixes. They create a "patch
release branch" from the minor version's last release tag, queue up
anything that looks relevant onto a temporary "patch release branch". They
cut the RC from that branch. When an RC is accepted, the permanent tag is
pushed and the "patch release branch" is deleted.

> Stabilizing a branch with lots of new features is just human-impossible.

Well, not human-impossible, but I agree that it is a lot of work. We do
this ahead of every minor release, so what changes? Isn't it better to do
the stabilization on branch-2 (or master) than after a minor release line
branch is cut? Isn't it better to do the stabilization on a feature branch
of that big new feature, before it gets merged?

> And this is also why we use semantic versioning. Major for big
incompatible features, minor for almost compatible features, and patch with
almost no new features.

Nothing I've proposed here is against or incompatible with semantic
versioning.

> So for me, if we think 2.3.x will the last branch-2 minor release line,
and new features are not expected to be backported to branch-2 by default,
then it is OK that we just make release from branch-2.

I'm not at all suggesting that branch-2 should end with 2.3. What I'm
suggesting is that we shift our mentality of "back-port this patch to
branch-2 when it's dev complete" to "back-port this patch to branch-2 when
it's production ready." IMHO, we already treat master as the "dev complete"
branch, so what's the benefit of doing so also with branch-2?

Thanks,
Nick

On Tue, Mar 3, 2020 at 4:01 PM 张铎(Duo Zhang)  wrote:

> I’m -1 on doing the same on branch-2.
>
> Stabilizing a branch with lots of new features is just human-impossible.
> Even if you run ITBLL for every commit, you may still miss something.
> Welcome to the distributed system world. And this is also why we use
> semantic versioning. Major for big incompatible features, minor for almost
> compatible features, and patch with almost no new features.
>
> IIRC for branch-1, the conclusion is that, new features are not likely to
> be backported by default, and no one actually take care of branch-1, so to
> reduce the work for release managers 1.x, we decided to make release
> directly on branch-1.
>
> So for me, if we think 2.3.x will the last branch-2 minor release line, and
> new features are not expected to be backported to branch-2 by default, then
> it is OK that we just make release from branch-2. But I do not think this
> is the truth, people are still discussing how to land the rsgroup changes
> on branch-2 and think it should be landed for 2.4.x...
>
> Thanks.
>
> Nick Dimiduk 于2020年3月4日 周三02:45写道:
>
> > Hello,
> >
> > What is the current thinking around branch-2 releases? It seems branch-1
> is
> > doing away with "a branch per minor release" strategy. I'm curious if we
> > should be doing the same on branch-2. To summarize the argument for, as I
> > see it,
> >
> > The pros:
> >  - consistent model for both active release lines.
> >  - encourages more rapid release of new features and the minor releases
> > that cary them.
> >  - reduces developer overhead managing back ports.
> >
> > The cons:
> >  - difficult to "stabilize" a minor release line.
> >  - complex "timing" issues when back-porting new features from master.
> >  - more painful to produce patch releases.
> >
> > What other bullets did I miss?
> >
> > I am personally in favor of this approach. I think it provides two major
> > benefits: increase the velocity of feature releases and raise the quality
> > bar on commits. My counter-arguments to the cons are:
> >
> > > difficult to "stabilize" a minor release line
> >
> > I argue that this is a process issue, and should be addressed before
> > patches land. We -- the community -- need to help contributors to
> validate
> > their changes while they are in the PR process and sit on feature
> branches,
> > before the arrive on the release branch. I argue this should happen
> before
> > they hit master as well, but it seems that branch has some tech debit
> that
> > needs addressed first.
> >
> > > complex "timing" iss

Re: [DISCUSS] Drop support for code contribution via Jira attached patch

2020-03-04 Thread Nick Dimiduk
> If it is not easy to add JDK11 support for the PreCommit-HBASE-Build job
then I'm +1 with stopping it.

It's a bit of work to bring Jira PreCommit up to feature parity with GitHub
PreCommit, especially seeing the work that was necessary to get GitHub
PreCommit to do JDK11. From what I can tell, the work includes:

HBASE-23879 Convert Jira attached patch precommit to a Jenkinsfile build
HBASE-23888 PreCommit-HBASE-Build ignores the `ATTACHMENT_ID` provided by
PreCommit-Admin
HBASE-23875 Add JDK11 compilation and unit test support to Jira attached
patch precommit

All the work is in 23879 and 23888. Since I've already done the work for
JDK11 on GitHub PreCommit (HBASE-23767), 23875 should be fairly
straightforward. If we're going to keep both, we should add one more step,
which is to fold them both in the same `Jenkinsfile` and
`jenkins_precommit_yetus.sh` files.

Since the above work is blocking the merge of JDK11 support for GitHub
PreCommit (HBASE-23767) and Nightlies (HBASE-23876), I'm going to disable
the Jira PreCommit job (rename it, so that PreCommit-Admin no longer find
it), and merge the JDK11 branches. If folks come back around missing the
Jira PreCommit, we can revive this discussion.

Thanks,
Nick

On Mon, Feb 24, 2020 at 5:38 PM 张铎(Duo Zhang)  wrote:

> If it is not easy to add JDK11 support for the PreCommit-HBASE-Build job
> then I'm +1 with stopping it. We should make all the contribution ways have
> the same pre commit check.
>
> Nick Dimiduk  于2020年2月25日周二 上午7:33写道:
>
> > > The bot is for lots of projects, not only HBase, so we can not shut it
> > down.
> >
> > Pardon me Duo. I am proposing we terminate PreCommit-HBASE-Build. The
> > PreCommit-Admin job appears to be the responsibility of Yetus and is out
> of
> > scope of my discussion here.
> >
> > > There's already debt on that job; Nick has been starting to pay it down
> > and I'd imagine that's how we got to this thread about just turning it
> off.
> >
> > Sean is calling me out here, and it's correct.
> >
> > I have recently been looking to add JDK11 support to all of our CI jobs
> > (HBASE-23875 and friends), and Jira-attached patch files is the one that
> > demands the most work, just to get up to be on par with the rest. See
> > HBASE-23874 and HBASE-23879 as prerequisites And then today I
> uncovered
> > this little gem: it looks like PreCommit-HBASE-Build is broken for all
> but
> > whatever is the first attachment file recognized as a patch.
> > https://issues.apache.org/jira/browse/HBASE-23888
> >
> > Final nail in the coffin is that it seems the global PreCommit-Admin job
> is
> > not actually running on it's intended 10-minute interval. It broke today
> > and no one noticed until I went looking. When it does run, it's somehow
> > missing newly attached files. This is evident by way of the output of the
> > last couple job runs missing Issue/Attachment pairs for the various
> > attachments on HBASE-23874.
> >
> > So yes, I do believe that a PR results in a better code review. I'm also
> > selfishly interested in cutting work that we don't *have* to do,
> supporting
> > a process that isn't used by the majority of our contributors.
> >
> > On Sat, Feb 22, 2020 at 6:52 AM Sean Busbey  wrote:
> >
> > > There's a dedicated precommit job just for the hbase main repo. We can
> > shut
> > > it down without impacting other projects or even other hbase repos.
> > There's
> > > already debt on that job; Nick has been starting to pay it down and I'd
> > > imagine that's how we got to this thread about just turning it off. For
> > > example branch 1 handling is currently broken for the jira patch tester
> > but
> > > not the GitHub PR tester.
> > >
> > > I personally don't find it more difficult to evaluate either kind of
> > > contribution because I rely on Apache Yetus Smart Apply Patch which
> "just
> > > works" most of the time given just a jira key to grab either patch or
> > PRs.
> > >
> > > However, I definitely agree with the much more difficult commenting
> > > workflow with just jira. Personally that means if my feedback is going
> to
> > > be more than "lgtm" then I ask the contributor to shift to a PR.
> > >
> > > I looked at runs of the jira patch precommit bit over Feb and most of
> > them
> > > look like either duplicate work where it ended up testing a GitHub PR
> > that
> > > the dedicated PR precommit also tested or an old patch getting retested
> >

Pending JDK11 changes to CI

2020-03-04 Thread Nick Dimiduk
Heya,

I want to send a head's up for those not following along the efforts toward
2.3.x and the thread "[DISCUSS] Drop support for code contribution via Jira
attached patch". The summary is outlined at the end of that thread, but to
copy over here.

>  I'm going to disable the Jira PreCommit job (rename it, so that
PreCommit-Admin no longer find it), and merge the JDK11 branches.

>From the release notes on HBASE-23767,

> Rebuild our Dockerfile with support for multiple JDK versions. Use
> multiple stages in the Jenkinsfile instead of yetus's multijdk because
> of YETUS-953. Run those multiple stages in parallel to speed up
> results.
>
> Note that multiple stages means multiple Yetus invocations means
> multiple comments on the PreCommit. This should become more obvious to
> users once we can make use of GitHub Checks API, HBASE-23902.

These changes will be applied to master and branch-2. As far as I can tell,
they should not impact any other release branches. However, if I've missed
something, and you notice some issue, please speak up.

Thanks,
Nick


Re: Pending JDK11 changes to CI

2020-03-05 Thread Nick Dimiduk
HBASE-23876 and HBASE-23767 are pushed to master and branch-2. I'll be
keeping an eye on things over the coming days. Thanks for you patience as
we work through these enhancements.

-n

On Wed, Mar 4, 2020 at 9:58 AM Nick Dimiduk  wrote:

> Heya,
>
> I want to send a head's up for those not following along the efforts
> toward 2.3.x and the thread "[DISCUSS] Drop support for code contribution
> via Jira attached patch". The summary is outlined at the end of that
> thread, but to copy over here.
>
> >  I'm going to disable the Jira PreCommit job (rename it, so that
> PreCommit-Admin no longer find it), and merge the JDK11 branches.
>
> From the release notes on HBASE-23767,
>
> > Rebuild our Dockerfile with support for multiple JDK versions. Use
> > multiple stages in the Jenkinsfile instead of yetus's multijdk because
> > of YETUS-953. Run those multiple stages in parallel to speed up
> > results.
> >
> > Note that multiple stages means multiple Yetus invocations means
> > multiple comments on the PreCommit. This should become more obvious to
> > users once we can make use of GitHub Checks API, HBASE-23902.
>
> These changes will be applied to master and branch-2. As far as I can
> tell, they should not impact any other release branches. However, if I've
> missed something, and you notice some issue, please speak up.
>
> Thanks,
> Nick
>


Re: [DISCUSS] Branching strategy for ongoing branch-2 releases

2020-03-10 Thread Nick Dimiduk
Alright Duo, you've convinced me to continue with the branch-x.y strategy
for the 2.3.0 release line. I strongly believe that in order to get to a
more frequent minor release cadence, we'll want to make the process change
that I propose, but I agree that there's a lot of work to do for
stabilizing the HEAD of a release line. Hopefully this time next year we'll
be in a better place for this.

Thanks,
Nick

On Wed, Mar 4, 2020 at 4:00 PM 张铎(Duo Zhang)  wrote:

> Oh missed the last part
> >
> > IMHO, we already treat master as the "dev complete"
> > branch, so what's the benefit of doing so also with branch-2?
> >
> This is because of our release model. We consider master and 3.0.0. That
> exactly matches the semantic versioning, master is for the next major
> release line, branch-x is for the next minor release line, and branch-x.x
> is for the next patch release. And once you want to cut a major release,
> branch it out as branch-x, and once you want to cut a minor release, branch
> it out as branch-x.x.
>
> 张铎(Duo Zhang)  于2020年3月5日周四 上午7:55写道:
>
> >
> >
> > Nick Dimiduk  于2020年3月5日周四 上午12:44写道:
> >
> >> > What will be the criterion for a new patch release with this model?
> >>
> >> From my previous RM experience, I really liked the process of the RM
> >> coming
> >> around once/month, check the branch for activity, and if there's enough
> >> fixes to justify, do a release. In my proposed model, that monthly
> cadence
> >> doesn't change. The RM would scan Jira for the release line branch
> >> (branch-2, in this case), looking for bug fixes. They create a "patch
> >> release branch" from the minor version's last release tag, queue up
> >> anything that looks relevant onto a temporary "patch release branch".
> They
> >> cut the RC from that branch. When an RC is accepted, the permanent tag
> is
> >> pushed and the "patch release branch" is deleted.
> >>
> > But on branch-2.1, I've done about 10 releases. I just checked the issues
> > which have fix versions on 2.1.x and moved out unresolved ones and then
> > made a release.
> > What I can see is that we still had 50+ fixes even for 2.1.9. I do not
> > think it is an easy work for a RMr to backport 50+ patches by his/her
> > own, plus you need to review maybe hundreds of issues on jira to decide
> > whether it should be backported. Your propose is just increasing the work
> > of RM...
> >
> > > Stabilizing a branch with lots of new features is just
> human-impossible.
> >>
> >> Well, not human-impossible, but I agree that it is a lot of work. We do
> >> this ahead of every minor release, so what changes? Isn't it better to
> do
> >> the stabilization on branch-2 (or master) than after a minor release
> line
> >> branch is cut? Isn't it better to do the stabilization on a feature
> branch
> >> of that big new feature, before it gets merged?
> >>
> > These are different levels of stablization. Usually if all UTs are fine,
> > we will let a feature in. This is enough for a contributor. But for a
> minor
> > release, we need to run ITBLL several times. That's why it is not a good
> > idea to do the final stablizing on branch-2. There are only two results,
> > either you can not stablize the branch as we still pull in new features,
> or
> > you block all the backports for a while...
> >
> >>
> >> > And this is also why we use semantic versioning. Major for big
> >> incompatible features, minor for almost compatible features, and patch
> >> with
> >> almost no new features.
> >>
> >> Nothing I've proposed here is against or incompatible with semantic
> >> versioning.
> >>
> > But the work for a RM will be increasd dramatically if you still want to
> > make patch releases. This is not good. Usually we consider a patch
> release
> > to be stable after several patch releases, and I think this proposal will
> > lead to very very few patch releases. Trust me, people are lazy, we are
> not
> > a company, all people are volunteers, so do not put too many works on a
> > single person...
> >
> >>
> >> > So for me, if we think 2.3.x will the last branch-2 minor release
> line,
> >> and new features are not expected to be backported to branch-2 by
> default,
> >> then it is OK that we just make release from branch-2.
> >>
> >> I'm not at all suggesting that branch-2 should e

Re: [DISCUSS] Bump hadoop versions on up coming 2.3.0 and 3.0.0 releases

2020-03-13 Thread Nick Dimiduk
> For me, I prefer we just bump the hadoop version of 2.3.0 directly to
2.10.0, as 2.9.x is almost dead too.

For 2.3, I raised this question earlier on "[DISCUSS] Hadoop dependency
versions for 2.3" [0]. Our conclusion was "if it ain't broke, don't fix it."

Since we've had no previous communication of dropping support for 2.8, I
was planning to release 2.3.x as the last HBase release line with support
for Hadoop 2.8.x. I haven't investigated the changelog closely, nor am I
fully versed in the implications this might have in our interactions with
the HDFS APIs, so I don't have a technical argument for or against moving
forward this minor version dependency.

Lacking further information, I am -0 on changing this dependency. If we do
decide to bump the minimum Hadoop minor version for 2.3.x, I agree that we
should target 2.10.

Thank for bringing to light the new information, raising the question Duo,
and for your perspective, Wei-Chiu.

[0]:
https://lists.apache.org/thread.html/r2e4d47ebc49cb25a09e49dde1a652d5e952266547238b8e2d90685db%40%3Cdev.hbase.apache.org%3E



On Wed, Mar 11, 2020 at 5:57 PM Wei-Chiu Chuang  wrote:

> +1 I'd like to encourage the community to update and reduce the Hadoop 2.x
> presence as much as possible.
>
> There is not an official DISCUSS/VOTE thread to EOL Hadoop 2.9 yet, even
> though so far the feedback has been quite receptive to the idea.
>
> Hadoop 2.10 is meant to be a "bridge release" for those who are not ready
> to upgrade to Hadoop 3.x.
> Given that the main contributors (LinkedIn, Microsoft and Verizon Media)
> are on 2.10, bypassing 2.9 altogether, the Hadoop 2.9 is not going to get
> much attention. Looking at git history, branch-2.9's got just 13 commits.
>
>
>
> On Wed, Mar 11, 2020 at 5:35 PM 张铎(Duo Zhang) 
> wrote:
>
> > Hadoop now has a wiki page to show the EOL releases lines
> >
> >
> >
> https://cwiki.apache.org/confluence/display/HADOOP/EOL+(End-of-life)+Release+Branches
> >
> >
> > 2.8.x is finally dead, so we'd better at least upgrade to 2.9.x in newer
> > release lines.
> >
> > And in this announcement
> >
> >
> >
> https://lists.apache.org/thread.html/r348f7bc93a522f05b7cce78a911854d128a6b1b8bd8124bad4d06ce6%40%3Cuser.hadoop.apache.org%3E
> >
> >
> > They even say you'd better upgrade to 2.10+.
> >
> > For me, I prefer we just bump the hadoop version of 2.3.0 directly to
> > 2.10.0, as 2.9.x is almost dead too. And since we still support hadoop
> > 2.9.x on 2.2.x release line, which is the current stable release line,
> > which should be fine for users.
> >
> > And for master branch, I suggest we just drop all the support for hadoop
> > 2.x and bump hadoop to 3.1.x directly.
> >
> > Thoughts? Thanks.
> >
>


Re: Working toward 2.3.0

2020-03-13 Thread Nick Dimiduk
Hello everyone,

An update here is well overdue. There's a lot of progress that's been made
over the last 6 weeks. We've seen a number of new features wrapped up and
land on branch-2, and a lot of effort has been done to prune and weed our
garden of unruly unit tests. We've also had a number of healthy
discussions, about our release process as well as our minimum Hadoop
version. I would still like to see our unit test suite running on JDK11,
which is an effort I will continue to pursue as I'm able (subtasks of
HBASE-22972).

I think we're finally ready to make branch-2.3. Once the branch is live, I
ask that folks pause actively backporting new features to branch-2.3.
Please continue with bug fixes. There is at least one data correctness
issue of which I am aware (HBASE-23286). Patches that improve and stabilize
our test suite are also strongly encouraged. We've seen stray blue
nightlies, but not often and not recently. If you're looking for direction
in this area, search Jira for "flakey" or ping me or Stack on Slack. The
GitHub PR PreCommit job now archives test logs, so it is now easier to
retrieve high fidelity details from runs on Apache infrastructure.

If you have concerns or objections, please raise them here. If there are
specific bugs you'd like me to track, please let me know. Otherwise, I'll
push the branch on Monday, start building a changes list and learning the
release tools.

Thank you everyone for your contributions and hard work.
Nick

On Mon, Jan 27, 2020 at 4:49 PM Nick Dimiduk  wrote:

> Heya,
>
> I wanted to give an update on progress toward the first 2.3.0 RCs.
>
> Jira has been cleaned up, with a bunch of unlikely tickets kicked out.
> FixVersions have also been audited to ensure they match what's in git. On
> first glance, this is shaping up to be a fat release, with over 800 issues
> in this version [0]. However, it's not quite as scary as all that. I was
> able to assemble a report of only the issues that are new to this version,
> tickets that have not seen a previous release [1]. This list is shorter,
> only about 175 currently, and when it's all said and done, I expect we'll
> land in somewhere on the order of 200.
>
> There are yet a few major tickets to land before the release branch is
> cut. The two bigger feature work items that I have my eyes on
> are HBASE-18095 and HBASE-22978. I've also marked HBASE-22972 as a
> blocker. If you have other new feature work you're hoping to get into
> 2.3.0, please set the fixVersion and ping me in the comments or your PR. If
> you have some spare cycles, please take a pass through our open PR's [2];
> there's quite a bit of low-hanging fruit in there that could use some
> attention.
>
> Our test situation is still looking a bit rough [3]. A couple brave souls
> have started working through the flakies [4], but there's more work to be
> done.
>
> That's where we're at. With this amount of work, I expect that puts the
> first RC sometime the week of Feb 10 or Feb 17. If folks want binaries they
> can start to abuse before then, let me know and I'll see what can be done.
>
> That's what I have for today. Please speak up with questions and concerns.
> Preemptive thank you to everyone who's been pushing to bring this release
> together. Your work has not gone unnoticed.
>
> Thanks,
> Nick
>
> [0]: https://issues.apache.org/jira/projects/HBASE/versions/12344893
> [1]: http://home.apache.org/~ndimiduk/new_for_branch-2.csv
> [2]: https://github.com/apache/hbase/pulls
> [3]:
> https://builds.apache.org/view/H-L/view/HBase/job/HBase%20Nightly/job/branch-2/
> [4]:
> https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests/job/branch-2/lastSuccessfulBuild/artifact/dashboard.html
>


Re: Request for Slack channel invite

2020-03-16 Thread Nick Dimiduk
Invitation sent.

On Mon, Mar 16, 2020 at 8:48 AM Prabhakar Reddy 
wrote:

> Please send me a invite for hbase slack channel.
>


<    1   2   3   4   5   6   7   8   9   10   >