Re: Ignite diagnostic (SQL system views)

Denis Magda Thu, 17 Oct 2019 14:51:14 -0700

Anton, Maxim,

Are we planning to release the views as part of 2.8? Don't see them listed
in the important features section:
https://cwiki.apache.org/confluence/display/IGNITE/Apache+Ignite+2.8#ApacheIgnite2.8-Themostimportantreleasetasks


-
Denis


On Wed, Feb 14, 2018 at 1:49 AM Anton Vinogradov <avinogra...@gridgain.com>
wrote:

> Vova,
>
> Could you confirm https://issues.apache.org/jira/browse/IGNITE-7527 ready
> to be merged?
>
> On Wed, Feb 14, 2018 at 12:01 PM, Vladimir Ozerov <voze...@gridgain.com>
> wrote:
>
> > I would start with NODES and NODE_ATTRIBUTES as the most simple thing.
> >
> > On Tue, Feb 13, 2018 at 4:10 AM, Denis Magda <dma...@apache.org> wrote:
> >
> > > Alex P, sounds like a good plan for me.
> > >
> > > Vladimir, do you have any suggestions or corrections?
> > >
> > > —
> > > Denis
> > >
> > > > On Feb 12, 2018, at 4:57 AM, Alex Plehanov <plehanov.a...@gmail.com>
> > > wrote:
> > > >
> > > > The views engine and the first view are almost ready to merge (review
> > > > comments are resolved). Which views should we take next? My proposal
> -
> > > > NODES, NODE_ATTRIBUTES, NODE_METRICS, NODE_HOSTS and NODE_ADDRESSES,
> > > since
> > > > these views are clear and all topology data available on each node.
> > > > Any objections?
> > > >
> > > > 2018-01-25 16:27 GMT+03:00 Alex Plehanov <plehanov.a...@gmail.com>:
> > > >
> > > >> Anton, Vladimir, I've made some fixes. There is only one view left
> and
> > > >> it's renamed to 'IGNITE.LOCAL_TRANSACTIONS'.
> > > >>
> > > >> High level design of solution:
> > > >> When IgniteH2Indexing is starting, it create and start
> > > >> new GridH2SysViewProcessor, which create and register in H2 (via its
> > own
> > > >> table engine) all implementations of system views. Each system view
> > > >> implementation extends base abstract class GridH2SysView. View
> > > >> implementation describes columns, their types and indexes in
> > constructor
> > > >> and must override method getRows for data retrieval (this method
> > called
> > > by
> > > >> H2-compatible table and index implementations for ignite system
> > views).
> > > >> Almost no fixes to existing parsing engine was made, except some
> > places,
> > > >> where GridH2Table instance was expected, but for system views there
> is
> > > >> another class.
> > > >>
> > > >> New PR: [1].  Please have a look.
> > > >>
> > > >> [1] https://github.com/apache/ignite/pull/3433
> > > >>
> > > >> 2018-01-24 19:12 GMT+03:00 Anton Vinogradov <
> avinogra...@gridgain.com
> > >:
> > > >>
> > > >>> I've created IEP-13 [1] to cover all cases.
> > > >>> Feel free to create issues.
> > > >>>
> > > >>> [1]
> > > >>> https://cwiki.apache.org/confluence/pages/viewpage.
> > > action?pageId=75962769
> > > >>>
> > > >>> On Wed, Jan 24, 2018 at 6:10 PM, Vladimir Ozerov <
> > voze...@gridgain.com
> > > >
> > > >>> wrote:
> > > >>>
> > > >>>> Let's start with a single and the most simple view, e.g.
> > > >>>> LOCAL_TRANSACTIONS. We will review and merge it along with
> necessary
> > > >>>> infrastructure. Then will handle the rest view in separate tickets
> > and
> > > >>>> separate focused discussions.
> > > >>>>
> > > >>>> On Wed, Jan 24, 2018 at 5:29 PM, Alex Plehanov <
> > > plehanov.a...@gmail.com
> > > >>>>
> > > >>>> wrote:
> > > >>>>
> > > >>>>> 1) It’s not a principal point, I can change schema. The
> > > >>>> INFORMATION_SCHEMA
> > > >>>>> was used because it’s already exists and usually used for
> metadata
> > > >>> tables
> > > >>>>> and views. Your proposal is to use schema “IGNITE”, am I
> understand
> > > >>> you
> > > >>>>> right? BTW, for now, we can’t query another (H2) meta tables from
> > the
> > > >>>>> INFORMATION_SCHEMA, so, “Ignite system views” is only available
> > views
> > > >>> to
> > > >>>>> query from this schema.
> > > >>>>> 2) Exactly for this reason the IGNITE_INSTANCE view is useful: to
> > > >>>> determine
> > > >>>>> which node we are connected to.
> > > >>>>> 3) As the first phase, in my opinion, local views will be enough.
> > > >>>>> Performance and caching of distributed views should be discussed
> at
> > > >>> next
> > > >>>>> phases, when distributed views implementation will be planned. In
> > > >>> current
> > > >>>>> implementation I tried to use indexing for local views wherever
> > it’s
> > > >>>>> possible.
> > > >>>>> 4) I don’t think, that JVM info is more critical information
> than,
> > > for
> > > >>>>> example, caches or nodes information. When authorization
> > capabilities
> > > >>>>> planned to implement?
> > > >>>>>
> > > >>>>> About local data: yes, we can rename all currently implemented
> > views
> > > >>> for
> > > >>>>> the local node data as LOCAL_..., and create (someday) new whole
> > > >>> cluster
> > > >>>>> views (which use distributed requests) without prefix or, for
> > > example,
> > > >>>> with
> > > >>>>> CLUSTER_ prefix. But some views can show all cluster information
> > > using
> > > >>>> only
> > > >>>>> local node data, without distributed requests (for example
> > > >>>>> IGNITE_NODE_METRICS, IGNITE_PART_ASSIGNMENT,
> > IGNITE_PART_ALLOCATION,
> > > >>>>> IGNITE_NODES, etc). Are they local or cluster views in this
> > concept?
> > > >>>> Which
> > > >>>>> prefix should be used? And what about caches? Are they local or
> > > >>> cluster?
> > > >>>> On
> > > >>>>> local node we can see cluster wide caches (replicated and
> > > distributed)
> > > >>>> and
> > > >>>>> caches for current node only. Local caches list may differ from
> > node
> > > >>> to
> > > >>>>> node. Which prefix should be used for this view? And one more,
> > there
> > > >>> is
> > > >>>> no
> > > >>>>> sense for some views to make them cluster wide (for example
> > > >>>>> INGNITE_INSTANCE). Should we name it LOCAL_INSTANCE without
> > creating
> > > >>>>> INSTANCE view?
> > > >>>>>
> > > >>>>> So, next steps: split PR, change schema name (IGNITE?), change
> view
> > > >>> name
> > > >>>>> for caches (CACHES, LOCAL_CACHES?)
> > > >>>>>
> > > >>>>>
> > > >>>>> 2018-01-24 13:03 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com
> >:
> > > >>>>>
> > > >>>>>> Hi Alex,
> > > >>>>>>
> > > >>>>>> System views could be extremely valuable addition for Ignite.
> > > >>> Ideally,
> > > >>>>> user
> > > >>>>>> should be able to monitor and manage state of the whole cluster
> > > >>> with a
> > > >>>>>> single SQL command line. We have plans to implement it for a
> very
> > > >>> long
> > > >>>>>> time. However, this is very sensitive task which should take a
> lot
> > > >>> of
> > > >>>>>> moving pieces in count, such as usability, consistency,
> > performance,
> > > >>>>>> security, etc..
> > > >>>>>>
> > > >>>>>> Let me point several major concerns I see at the moment:
> > > >>>>>>
> > > >>>>>> 1) Usability: INFORMATION_SCHEMA
> > > >>>>>> This schema is part of SQL ANSI standard. When creating system
> > > >>> views,
> > > >>>>> some
> > > >>>>>> vendors prefer to store them in completely different predefined
> > > >>> schema
> > > >>>>>> (Oracle, MS SQL). Others prefer to keep them in
> INFORMATION_SCHEMA
> > > >>>>>> directly. Both approaches could work. However, the latter breaks
> > > >>>>> separation
> > > >>>>>> of concerns - we store typical metadata near to possibly
> sensitive
> > > >>>> system
> > > >>>>>> data. Also it makes security management more complex - system
> data
> > > >>> is
> > > >>>>> very
> > > >>>>>> sensitive, and now we cannot simply grant access
> > > >>> INFORMATIONAL_SCHEMA
> > > >>>> to
> > > >>>>>> user. Instead, we have to grant that access on per-view basis.
> For
> > > >>> this
> > > >>>>>> reason my preference is to store system tables in separate
> schema,
> > > >>> not
> > > >>>> in
> > > >>>>>> INFORMATION_SCHEMA
> > > >>>>>>
> > > >>>>>> 2) Consistency: local data
> > > >>>>>> One of implemented view GridH2SysViewImplInstance. Normally SQL
> > > >>> users
> > > >>>>>> communicate with Ignite through JDBC/ODBC drivers. These drivers
> > are
> > > >>>>>> connected to a single node, typically client node. Moreover, we
> > will
> > > >>>>>> introduce high-availability feature when drivers were able to
> > > >>> connect
> > > >>>> to
> > > >>>>>> any address from a predefined list. It renders this view
> useless,
> > as
> > > >>>> you
> > > >>>>> do
> > > >>>>>> not know which node you connected to. Also, local-only data
> cannot
> > > >>> be
> > > >>>>>> joined in general case - you will receive different results on
> > > >>>> different
> > > >>>>>> nodes. The same goes for transactions, JVM info, etc.
> > > >>>>>>
> > > >>>>>> 3) Performance
> > > >>>>>> Suppose we fixed consistency of transactions and now this view
> > shows
> > > >>>>>> transactions in the whole cluster with possibility to filter
> them
> > by
> > > >>>>> nodes
> > > >>>>>> - this is what user would expect out of the box. Another problem
> > > >>>> appears
> > > >>>>>> then - performance. How would we collect necessary data? How
> would
> > > >>> we
> > > >>>>>> handle joins, when particular view could be scanned multiple
> times
> > > >>>> during
> > > >>>>>> query execution? How we achieve sensible consistency? Most
> > probably
> > > >>> we
> > > >>>>>> would collect remote data once when query is started, cache it
> > > >>> somehow
> > > >>>> on
> > > >>>>>> query session level, and then re-use during joins. But again,
> this
> > > >>>> should
> > > >>>>>> be discussed separately.
> > > >>>>>>
> > > >>>>>> 4) Security: JVM info
> > > >>>>>> We should define clear boundaries of what info is exposed. JVM
> > data
> > > >>>> along
> > > >>>>>> with running threads is critically sensitive information. We
> > should
> > > >>> not
> > > >>>>>> expose it until we have authorization capabilities.
> > > >>>>>>
> > > >>>>>> In order to start moving this code from prototype to production
> > > >>> state
> > > >>>> we
> > > >>>>>> should start with the most simple and consistent views. E.g.
> > > >>>>> IGNITE_CACHES.
> > > >>>>>> Let's move it to a separate PR, review infrastructure code,
> review
> > > >>> view
> > > >>>>>> implementation, agree on proper naming and placement, and merge
> > it.
> > > >>>> Then
> > > >>>>>> each and every view (or group of related views) should be
> > discussed
> > > >>> and
> > > >>>>>> reviewed separately.
> > > >>>>>>
> > > >>>>>> As far as node-local stuff, may be we should move it to a
> separate
> > > >>>>> schema,
> > > >>>>>> or mark with special prefix. E.g. "IGNITE.TRANSACTIONS" - all
> > > >>>>> transactions
> > > >>>>>> in the cluster, "IGNITE.LOCAL_TRANSACTIONS" - transactions on
> the
> > > >>> local
> > > >>>>>> node. In this case we will be able to merge "local" stuff
> shortly,
> > > >>> and
> > > >>>>>> implement more complex but at the same time much more useful
> > > >>>> distributed
> > > >>>>>> stuff later on.
> > > >>>>>>
> > > >>>>>> Makes sense?
> > > >>>>>>
> > > >>>>>> Vladimir.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> On Tue, Jan 23, 2018 at 8:30 PM, Alex Plehanov <
> > > >>>> plehanov.a...@gmail.com>
> > > >>>>>> wrote:
> > > >>>>>>
> > > >>>>>>> Hello, Igniters!
> > > >>>>>>>
> > > >>>>>>> For Ignite diagnostic usually it’s helpful to get some Ignite
> > > >>>> internals
> > > >>>>>>> information. But currently, in my opinion, there are no
> > convenient
> > > >>>>> tools
> > > >>>>>>> for this purpose:
> > > >>>>>>>
> > > >>>>>>> ·        Some issues can be solved by analyzing log files. Log
> > > >>> files
> > > >>>>> are
> > > >>>>>>> useful for dumps, but sometimes they are difficult to read.
> Also
> > > >>>>>>> interesting metrics can’t be received runtime by request, we
> need
> > > >>> to
> > > >>>>> wait
> > > >>>>>>> until Ignite will write these metrics by timeout or other
> events.
> > > >>>>>>>
> > > >>>>>>> ·        JMX is useful for scalar metrics. Complex and table
> data
> > > >>> can
> > > >>>>>> also
> > > >>>>>>> be received, but it’s difficult to read, filter and sort them
> > > >>> without
> > > >>>>>>> processing by specialized external tools. For most frequently
> > used
> > > >>>>> cases
> > > >>>>>>> almost duplicating metrics are created to show data in an
> > > >>>> easy-to-read
> > > >>>>>>> form.
> > > >>>>>>>
> > > >>>>>>> ·        Web-console is able to show table and complex data.
> > > >>> Perhaps,
> > > >>>>>>> someday  web-console will contain all necessary dashboards for
> > > >>> most
> > > >>>>>> problem
> > > >>>>>>> investigation, but some non-trivial queries will not be covered
> > > >>>> anyway.
> > > >>>>>>> Also web-console needs additional infrastructure to work.
> > > >>>>>>>
> > > >>>>>>> ·        External “home-made” tools can be used for non-trivial
> > > >>>> cases.
> > > >>>>>> They
> > > >>>>>>> cover highly specialized cases and usually can’t be used as
> > > >>> general
> > > >>>>>> purpose
> > > >>>>>>> tools.
> > > >>>>>>>
> > > >>>>>>> Sometimes we are forced to use more than one tool and join data
> > by
> > > >>>>> hands
> > > >>>>>>> (for example, current thread dump and data from logs).
> > > >>>>>>>
> > > >>>>>>> Often RDBMS for diagnostic purposes provides system views (for
> > > >>>> example,
> > > >>>>>>> DBA_% and V$% in Oracle), which can be queried by SQL. This
> > > >>> solution
> > > >>>>>> makes
> > > >>>>>>> all internal diagnostic information available in a readable
> form
> > > >>>> (with
> > > >>>>>> all
> > > >>>>>>> possible filters and projections) without using any other
> > > >>> internal or
> > > >>>>>>> external tools. My proposal is to create similar system views
> in
> > > >>>>> Ignite.
> > > >>>>>>>
> > > >>>>>>> I implement working prototype (PR: [1]). It contains views:
> > > >>>>>>>
> > > >>>>>>> IGNITE_SYSTEM_VIEWS
> > > >>>>>>>
> > > >>>>>>> Registered system views
> > > >>>>>>>
> > > >>>>>>> IGNITE_INSTANCE
> > > >>>>>>>
> > > >>>>>>> Ignite instance
> > > >>>>>>>
> > > >>>>>>> IGNITE_JVM_THREADS
> > > >>>>>>>
> > > >>>>>>> JVM threads
> > > >>>>>>>
> > > >>>>>>> IGNITE_JVM_RUNTIME
> > > >>>>>>>
> > > >>>>>>> JVM runtime
> > > >>>>>>>
> > > >>>>>>> IGNITE_JVM_OS
> > > >>>>>>>
> > > >>>>>>> JVM operating system
> > > >>>>>>>
> > > >>>>>>> IGNITE_CACHES
> > > >>>>>>>
> > > >>>>>>> Ignite caches
> > > >>>>>>>
> > > >>>>>>> IGNITE_CACHE_CLUSTER_METRICS
> > > >>>>>>>
> > > >>>>>>> Ignite cache cluster metrics
> > > >>>>>>>
> > > >>>>>>> IGNITE_CACHE_NODE_METRICS
> > > >>>>>>>
> > > >>>>>>> Ignite cache node metrics
> > > >>>>>>>
> > > >>>>>>> IGNITE_CACHE_GROUPS
> > > >>>>>>>
> > > >>>>>>> Cache groups
> > > >>>>>>>
> > > >>>>>>> IGNITE_NODES
> > > >>>>>>>
> > > >>>>>>> Nodes in topology
> > > >>>>>>>
> > > >>>>>>> IGNITE_NODE_HOSTS
> > > >>>>>>>
> > > >>>>>>> Node hosts
> > > >>>>>>>
> > > >>>>>>> IGNITE_NODE_ADDRESSES
> > > >>>>>>>
> > > >>>>>>> Node addresses
> > > >>>>>>>
> > > >>>>>>> IGNITE_NODE_ATTRIBUTES
> > > >>>>>>>
> > > >>>>>>> Node attributes
> > > >>>>>>>
> > > >>>>>>> IGNITE_NODE_METRICS
> > > >>>>>>>
> > > >>>>>>> Node metrics
> > > >>>>>>>
> > > >>>>>>> IGNITE_TRANSACTIONS
> > > >>>>>>>
> > > >>>>>>> Active transactions
> > > >>>>>>>
> > > >>>>>>> IGNITE_TRANSACTION_ENTRIES
> > > >>>>>>>
> > > >>>>>>> Cache entries used by transaction
> > > >>>>>>>
> > > >>>>>>> IGNITE_TASKS
> > > >>>>>>>
> > > >>>>>>> Active tasks
> > > >>>>>>>
> > > >>>>>>> IGNITE_PART_ASSIGNMENT
> > > >>>>>>>
> > > >>>>>>> Partition assignment map
> > > >>>>>>>
> > > >>>>>>> IGNITE_PART_ALLOCATION
> > > >>>>>>>
> > > >>>>>>> Partition allocation map
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> There are much more useful views can be implemented (executors
> > > >>>>>> diagnostic,
> > > >>>>>>> SPIs diagnostic, etc).
> > > >>>>>>>
> > > >>>>>>> Some usage examples:
> > > >>>>>>>
> > > >>>>>>> Cache groups and their partitions, which used by transaction
> more
> > > >>>> than
> > > >>>>> 5
> > > >>>>>>> minutes long:
> > > >>>>>>>
> > > >>>>>>> SELECT cg.CACHE_OR_GROUP_NAME, te.KEY_PARTITION, count(*) AS
> > > >>>>> ENTITIES_CNT
> > > >>>>>>> FROM INFORMATION_SCHEMA.IGNITE_TRANSACTIONS t
> > > >>>>>>> JOIN INFORMATION_SCHEMA.IGNITE_TRANSACTION_ENTRIES te ON t.XID
> =
> > > >>>>> te.XID
> > > >>>>>>> JOIN INFORMATION_SCHEMA.IGNITE_CACHES c ON te.CACHE_NAME =
> > c.NAME
> > > >>>>>>> JOIN INFORMATION_SCHEMA.IGNITE_CACHE_GROUPS cg ON c.GROUP_ID =
> > > >>> cg.ID
> > > >>>>>>> WHERE t.START_TIME < TIMESTAMPADD('MINUTE', -5, NOW())
> > > >>>>>>> GROUP BY cg.CACHE_OR_GROUP_NAME, te.KEY_PARTITION
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> Average CPU load on server nodes grouped by operating system:
> > > >>>>>>>
> > > >>>>>>> SELECT na.VALUE, COUNT(n.ID), AVG(nm.AVG_CPU_LOAD) AVG_CPU_LOAD
> > > >>>>>>> FROM INFORMATION_SCHEMA.IGNITE_NODES n
> > > >>>>>>> JOIN INFORMATION_SCHEMA.IGNITE_NODE_ATTRIBUTES na ON na.NODE_ID
> > =
> > > >>>> n.ID
> > > >>>>>> AND
> > > >>>>>>> na.NAME = 'os.name'
> > > >>>>>>> JOIN INFORMATION_SCHEMA.IGNITE_NODE_METRICS nm ON nm.NODE_ID =
> > > >>> n.ID
> > > >>>>>>> WHERE n.IS_CLIENT = false
> > > >>>>>>> GROUP BY na.VALUE
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> Top 5 nodes by puts to cache ‘cache’:
> > > >>>>>>>
> > > >>>>>>> SELECT cm.NODE_ID, cm.CACHE_PUTS FROM
> > > >>>>>>> INFORMATION_SCHEMA.IGNITE_CACHE_NODE_METRICS cm
> > > >>>>>>> WHERE cm.CACHE_NAME = 'cache'
> > > >>>>>>> ORDER BY cm.CACHE_PUTS DESC
> > > >>>>>>> LIMIT 5
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> Does this implementation interesting to someone else? Maybe any
> > > >>> views
> > > >>>>> are
> > > >>>>>>> redundant? Which additional first-priority views must be
> > > >>> implemented?
> > > >>>>> Any
> > > >>>>>>> other thoughts or proposal?
> > > >>>>>>>
> > > >>>>>>> [1] https://github.com/apache/ignite/pull/3413
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > > >>
> > >
> > >
> >
>

Re: Ignite diagnostic (SQL system views)

Reply via email to