Re: Major Compaction Concerns

Ted Yu Sun, 08 Jan 2012 12:22:32 -0800

Your request in first paragraph below deserves a JIRA.

For 2.b I agree a bug should be filed.


For major compaction, adding more logs on region server side should help
you understand the situation better - assuming you have interest to dig
further.
Please upgrade to 0.90.5, or you can wait for 0.90.6 release which is
slated for Jan. 19th.

After upgrade, the logs and code would be more pertinent to the tip of 0.90
branch.

Thanks for summarizing your findings.

On Sun, Jan 8, 2012 at 12:04 PM, Mikael Sitruk <mikael.sit...@gmail.com>wrote:

> In fact I think that for 2.a the current implementation is misleading.
> Creating a connection and getting the configuration from the connection
> should return the configuration of the cluster.
> Requesting the configuration used to build an object should return the
> configuration set on the object
> Additionally it should be a new method like getConfigurations(), or
> getClusterConfigurations() returning a map of serverinfo and
> configuration.  Another option is to add on the HRegionServer and HMaster a
> method getConfiguration() returning the configuration object used by the
> RegionServer or Master
>
> Regarding 2.b yes I tried but it did not return the setting from the
> cluster configuration (again server has non default configuration, table
> was not configured with specific values then cluster configuration should
> apply on the table object). So I see it as problematic.
>
> Mikael.s
>  On Jan 8, 2012 7:54 PM, <yuzhih...@gmail.com> wrote:
>
> > About 2b, have you tried getting the major compaction setting from column
> > descriptor ?
> >
> > For 2a, what you requested would result in new methods of
> > HBaseConfiguration class to be added. Currently the configuration on
> client
> > class path would be used.
> >
> > Cheers
> >
> >
> >
> > On Jan 8, 2012, at 9:28 AM, Mikael Sitruk <mikael.sit...@gmail.com>
> wrote:
> >
> > > Ted hi
> > > First thanks for answering, regarding the JIRA i will fill them
> > > Second, it seems that i did not explain myself correctly regarding
> 2.a. -
> > > As you i do not expect that a configuration set on my client will be
> > > propagated to the cluster, but i do expect that if i set a
> configuration
> > on
> > > a server then doing connection.getConfiguration() from a client i will
> > get
> > > teh configuration from the cluster.
> > > Currently the configuration returned is from the client config.
> > > So the problem is that you have no way to check the configuration of a
> > > cluster.
> > > I would expect to have some API to return the cluster config and even
> > > getting a map <serverInfo, config> so it can be easy to check cluster
> > > problem using code.
> > >
> > > 2.b. I know this code, and i tried to validate it. I set in the server
> > > config the "hbase.hregion.majorcompaction" to "0", then start the
> server
> > > (cluster). Since from the UI or from JMX this parameter is not visible
> at
> > > the cluster level, I try to get the value from the client (to see that
> > the
> > > cluster is using it)
> > >
> > > *HTableDescriptor hTableDescriptor =
> > > conn.getHTableDescriptor(Bytes.toBytes("my table"));*
> > >
> > > *hTableDescriptor.getValue("hbase.hregion.majorcompaction")*
> > > but i still got 24h (and not the value set in the config)! that was my
> > > problem from the beginning! ==> Using the config (on the server side)
> > will
> > > not propagate into the table/column family
> > >
> > > Mikael.S
> > >
> > > On Sun, Jan 8, 2012 at 7:13 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> > >
> > >> I am not expert in major compaction feature.
> > >> Let me try to answer questions in #2.
> > >>
> > >> 2.a
> > >>> If I set the property via the configuration shouldn’t all the cluster
> > be
> > >>> aware of?
> > >>
> > >> There're multiple clients connecting to one cluster. I wouldn't expect
> > >> values in the configuration (m_hbConfig) to propagate onto the
> cluster.
> > >>
> > >> 2.b
> > >> Store.getNextMajorCompactTime() shows that
> > "hbase.hregion.majorcompaction"
> > >> can be specified per column family:
> > >>
> > >> long getNextMajorCompactTime() {
> > >>   // default = 24hrs
> > >>   long ret = conf.getLong(HConstants.MAJOR_COMPACTION_PERIOD,
> > >> 1000*60*60*24);
> > >>   if (family.getValue(HConstants.MAJOR_COMPACTION_PERIOD) != null) {
> > >>
> > >> 2.d
> > >>> d. I tried also to setup the parameter via hbase shell but setting
> such
> > >>> properties is not supported. (do you plan to add such support via the
> > >>> shell?)
> > >>
> > >> This is a good idea. Please open a JIRA.
> > >>
> > >> For #5, HBASE-3965 is an improvement and doesn't have a patch yet.
> > >>
> > >> Allow me to quote Alan Kay: 'The best way to predict the future is to
> > >> invent it.'
> > >>
> > >> Once we have a patch, we can always backport it to 0.92 after some
> > people
> > >> have verified the improvement.
> > >>
> > >>> 6.       In case a compaction (major) is running it seems there is no
> > way
> > >>> to stop-it. Do you plan to add such feature?
> > >>
> > >> Again, logging a JIRA would provide a good starting point for
> > discussion.
> > >>
> > >> Thanks for the verification work and suggestions, Mikael.
> > >>
> > >> On Sun, Jan 8, 2012 at 7:27 AM, Mikael Sitruk <
> mikael.sit...@gmail.com
> > >>> wrote:
> > >>
> > >>> I forgot to mention, I'm using HBase 0.90.1
> > >>>
> > >>> Regards,
> > >>> Mikael.S
> > >>>
> > >>> On Sun, Jan 8, 2012 at 5:25 PM, Mikael Sitruk <
> mikael.sit...@gmail.com
> > >>>> wrote:
> > >>>
> > >>>> Hi
> > >>>>
> > >>>>
> > >>>>
> > >>>> I have some concern regarding major compactions below...
> > >>>>
> > >>>>
> > >>>>   1. According to best practices from the mailing list and from the
> > >>>>   book, automatic major compaction should be disabled. This can be
> > >> done
> > >>> by
> > >>>>   setting the property ‘hbase.hregion.majorcompaction’ to ‘0’.
> > >>> Neverhteless
> > >>>>   even after having doing this I STILL see “major compaction”
> messages
> > >>> in
> > >>>>   logs. therefore it is unclear how can I manage major compactions.
> > >> (The
> > >>>>   system has heavy insert - uniformly on the cluster, and major
> > >>> compaction
> > >>>>   affect the performance of the system).
> > >>>>   If I'm not wrong it seems from the code that: even if not
> requested
> > >>>>   and even if the indicator is set to '0' (no automatic major
> > >>> compaction),
> > >>>>   major compaction can be triggered by the code in case all store
> > >> files
> > >>> are
> > >>>>   candidate for a compaction (from Store.compact(final boolean
> > >>> forceMajor)).
> > >>>>   Shouldn't the code add a condition that automatic major compaction
> > >> is
> > >>>>   disabled??
> > >>>>
> > >>>>   2. I tried to check the parameter  ‘hbase.hregion.majorcompaction’
> > >> at
> > >>>>   runtime using several approaches - to validate that the server
> > >> indeed
> > >>>>   loaded the parameter.
> > >>>>
> > >>>> a. Using a connection created from local config
> > >>>>
> > >>>> *conn = (HConnection) HConnectionManager.getConnection(m_hbConfig);*
> > >>>>
> > >>>> *conn.getConfiguration().getString(“hbase.hregion.majorcompaction”)*
> > >>>>
> > >>>> returns the parameter from local config and not from cluster. Is it
> a
> > >>> bug?
> > >>>> If I set the property via the configuration shouldn’t all the
> cluster
> > >> be
> > >>>> aware of? (supposing that the connection indeed connected to the
> > >> cluster)
> > >>>>
> > >>>> b.  fetching the property from the table descriptor
> > >>>>
> > >>>> *HTableDescriptor hTableDescriptor =
> > >>>> conn.getHTableDescriptor(Bytes.toBytes("my table"));*
> > >>>>
> > >>>> *hTableDescriptor.getValue("hbase.hregion.majorcompaction")*
> > >>>>
> > >>>> This will returns the default parameter value (1 day) not the
> > parameter
> > >>>> from the configuration (on the cluster). It seems to be a bug, isn’t
> > >> it?
> > >>>> (the parameter from the config, should be the default if not set at
> > the
> > >>>> table level)
> > >>>>
> > >>>> c. The only way I could set the parameter to 0 and really see it is
> > via
> > >>>> the Admin API, updating the table descriptor or the column
> descriptor.
> > >>> Now
> > >>>> I could see the parameter on the web UI. So is it the only way to
> set
> > >>>> correctly the parameter? If setting the parameter via the
> > configuration
> > >>>> file, shouldn’t the webUI show this on any table created?
> > >>>>
> > >>>> d. I tried also to setup the parameter via hbase shell but setting
> > such
> > >>>> properties is not supported. (do you plan to add such support via
> the
> > >>>> shell?)
> > >>>>
> > >>>> e. Generally is it possible to get via API the configuration used by
> > >> the
> > >>>> servers? (at cluster/server level)
> > >>>>
> > >>>>    3.  I ran both major compaction  requests from the shell or from
> > >> API
> > >>>> but since both are async there is no progress indication. Neither
> the
> > >> JMX
> > >>>> nor the Web will help here since you don’t know if a compaction task
> > is
> > >>>> running. Tailling the logs is not an efficient way to do this
> neither.
> > >>> The
> > >>>> point is that I would like to automate the process and avoid
> > compaction
> > >>>> storm. So I want to do that region, region, but if I don’t know
> when a
> > >>>> compaction started/ended I can’t automate it.
> > >>>>
> > >>>> 4.       In case there is no compaction files in queue (but still
> you
> > >>> have
> > >>>> more than 1 storefile per store e.g. minor compaction just finished)
> > >> then
> > >>>> invoking major_compact will indeed decrease the number of store
> files,
> > >>> but
> > >>>> the compaction queue will remain to 0 during the compaction task
> > >>> (shouldn’t
> > >>>> the compaction queue increase by the number of file to compact and
> be
> > >>>> reduced when the task ended?)
> > >>>>
> > >>>>
> > >>>> 5.       I saw already HBASE-3965 for getting status of major
> > >> compaction,
> > >>>> nevertheless it has be removed from 0.92, is it possible to put it
> > >> back?
> > >>>> Even sooner than 0.92?
> > >>>>
> > >>>> 6.       In case a compaction (major) is running it seems there is
> no
> > >> way
> > >>>> to stop-it. Do you plan to add such feature?
> > >>>>
> > >>>> 7.       Do you plan to add functionality via JMX (starting/stopping
> > >>>> compaction, splitting....)
> > >>>>
> > >>>> 8.       Finally there were some request for allowing custom
> > >> compaction,
> > >>>> part of this was given via the RegionObserver in HBASE-2001,
> > >> nevertheless
> > >>>> do you consider adding support for custom compaction (providing real
> > >>>> pluggable compaction stategy not just observer)?
> > >>>>
> > >>>>
> > >>>> Regards,
> > >>>> Mikael.S
> > >>>>
> > >>>>
> > >>>
> > >>>
> > >>> --
> > >>> Mikael.S
> > >>>
> > >>
> > >
> > >
> > >
> > > --
> > > Mikael.S
> >
>

Re: Major Compaction Concerns

Reply via email to