A failed Trafodion installation can lead to the hbase:meta table staying in the FAILED_OPEN state.

2016-03-08 Thread D. Markt
Hi,

  I ran into this situation during a recent installation and thought it
might be useful if others were to hit a similar situation in the future.
This isn't the only way to recover from the situation but it is one option
and was proven to work as expected.

Regards,
Dennis

  During a recent Trafodion cluster install the daily build was broken in
such a way that much of the installation proceeded, but the Trafodion files
were not copied to each node.  This system was using CDH but I assume the
following would happen for HDP as well.  After HBase was restarted as part
of the installation I noticed the HBase icon was red.  I know this will
likely not look the best in plain text, but the hbase:meta showed (in a red
box):

Region  State   RIT time (ms)
1588230740  hbase:meta,,1.1588230740 state=FAILED_OPEN, ts=Mon Mar 07
07:19:00 UTC 2016 (1289s ago),
server=perf-sles-2.novalocal,60020,14573351205071289706

  Looking at the Region Server's log file that was assigned the hbase:meta
table there was this output:

2016-03-07 16:45:27,243 INFO
org.apache.hadoop.hbase.regionserver.RSRpcServices: Open
hbase:meta,,1.1588230740
2016-03-07 16:45:27,249 ERROR
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open
of region=hbase:meta,,1.1588230740, starting to roll back the global
memstore size.
java.lang.IllegalStateException: Could not instantiate a region instance.
at
org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:5486)
at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5793)
at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5765)
at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5721)
at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5672)
at
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(Op
enRegionHandler.java:356)
at
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenR
egionHandler.java:126)
at
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
45)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
15)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException:
Class org.apache.hadoop.hbase.regionserver.transactional.TransactionalRegion
not found
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2112)
at
org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:5475)
... 10 more
Caused by: java.lang.ClassNotFoundException: Class
org.apache.hadoop.hbase.regionserver.transactional.TransactionalRegion not
found
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2018)
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2110)
... 11 more
2016-03-07 16:45:27,250 INFO
org.apache.hadoop.hbase.coordination.ZkOpenRegionCoordination: Opening of
region {ENCODED => 1588230740, NAME => 'hbase:meta,,1', STARTKEY => '',
ENDKEY => ''} failed, transitioning from OPENING to FAILED_OPEN in ZK,
expecting version 115

After consulting with our installer expert, the issue was in fact that the
needed files had not been copied to each node.  At that point one option
would be to re-install the previous build or at least undo the changes made
to point to the new build.  I did not try that and I'll leave that fallback
option as a separate topic.

  Instead, I took the path to see if I could get HBase to successfully come
up without getting the new Trafodion installation properly completed.  To do
that there are two HBase properties that have to be reset:

.   hbase.coprocessor.region.classes
.   hbase.hregion.impl

I actually deleted all of the properties listed under the hbase-site.xml
that showed as non-default values by Cloudera Manager but I assume only the
hbase.hregion.impl property had to be removed.  Remember to save the
configuration and remove both sets of properties.  I forgot to do both of
those and each time the restart hit the same basic error.

  Once the configuration is properly updated the restart will be successful
and after the hbase:meta table can be opened by the Region Server, all the
other regions will also be able to be opened.  However, without Trafodion
running I would assume none of the Trafodion tables should be acted upon.
This exercise was to prove HBase could be restarted and running so that when
the Trafodion installation was started it would have a viable
Cloudera/HBase/HDFS environment to act on.



RE: how to tell the most time-consuming part for a given Trafodion query plan?

2016-03-08 Thread Selva Govindarajan
Hi Ming,



The counters or metrics returned by RMS in Trafodion are documented at
http://trafodion.apache.org/docs/2.0.0/sql_reference/index.html#sql_runtime_statistics
.



Counters displayed in operator stats:

The DOP(Degree of Parallelism> determines the number of ESPs involved in
executing the Trafodion Operator or TDB(Task Definition Block). The TDB can
be identified by the number in ID column. LC and RC denotes the left and
right child of the operator. Using these IDs and the parent TDBID (PaID),
one can construct the query plan from this output. Dispatches column gives
an indication how often the operator is scheduled for execution by the
Trafodion SQL Engine scheduler. An operator is scheduled and run traversing
the different steps within itself till it can't continue or it gives up on
its own for other operators to be scheduled.

During query execution, you will see these metrics being changed
continuously for all the operators as the data flows across it till a
blocking operator is encountered in the plan. The blocking operators are
EX_SORT, EX_HASH_JOIN and EX_HASH_GRBY..

The operator cpu time is the sum of the cpu time spent in the operator in
the executor thread of all the processes hosting the operator. Operator cpu
time is real and measured in real time in microseconds and it is NOT a
relative number. It doesn’t include the cpu time spent by other threads in
executing the tasks on behalf of the executor thread. Usually, trafodion
executor instance is run in a single thread and the engine can have
multiple executor instances running in a process to support multi-threaded
client applications. Most notably, the Trafodion engine uses thread pool to
pre-fetch the rows while rows are fetched sequentially. It is also possible
that Hbase uses thread pools to complete the operation requested by
Trafodion. These thread timings are not included in the operator cpu time.
To account for this, RMS provides another counter in a different view. It
is the pertable view.



GET STATISTICS FOR QID  PERTABLE provides the following counters:



HBase/Hive IOs

Numbers of messages sent to HBase Region Servers(RS)

HBase/Hive IO MBytes

The cumulative size of these messages in MB accounted at the Trafodion
layer.

HBase/Hive Sum IO Time

The cumulative time taken in microseconds by RS to respond and summed up
across all ESPs

HBase/Hive Max IO Time

The maximum of the cumulative time taken in microseconds by RS to respond
for any ESP. This gives an indication how much of the elapsed time is spent
in HBase because the messages to RS are blocking

The Sum and Max IO time are the elapsed time measured as wall clock time in
microseconds.



The max IO time should be less than the elapsed time or response time of
the query. If the max IO time is closer to the elapsed time, then most of
the time is spent in Hbase.

The sum IO time should be less than the DOP * elapsed time.

The Operator time is the CPU time.

I sincerely hope you will find the above information useful to digest the
output from RMS.  I would say reading, analyzing and interpreting the
output from RMS is an art that you would develop over time and it is always
difficult to document every usage scenario. If you find something that
needs to be added or isn’t correct, please let us know.



Selva





*From:* Liu, Ming (Ming) [mailto:ming@esgyn.cn]
*Sent:* Tuesday, March 8, 2016 5:41 PM
*To:* user@trafodion.incubator.apache.org
*Subject:* how to tell the most time-consuming part for a given Trafodion
query plan?



Hi, all,



We have running some complex queries using Trafodion, and need to analyze
the performance. One question is, if we want to know which part of the plan
take longest time, is there any good tool/skills to answer this?



I can use ‘get statistics for qid  default’ to get runtime stats. But
it is rather hard to interpret the output. I assume the “Oper CPU Time” is
the best one we can trust? But I am not sure it is the pure CPU time, or it
also include ‘waiting time’? If I want to know the whole time an operation
from start to end, is there any way?

And if it is CPU time, is it ns or something else, or just a relative
number?



Here is an example output of ‘get statistics’



LC   RC   Id   PaId ExId Frag TDB Name DOP
Dispatches  Oper CPU Time  Est. Records Used  Act. Records Used
Details



12   .13   .70EX_ROOT  1
1 69  0  0 1945

11   .12   13   60EX_SPLIT_TOP 1
1 32 99,550,560  0

10   .11   12   60EX_SEND_TOP  10
32  1,844 99,550,560  0

9.10   11   62EX_SEND_BOTTOM   10
20666 99,550,560  0

8.910   62EX_SPLIT_BOTTOM  10
40411 99,550,560  0 53670501

67895

how to tell the most time-consuming part for a given Trafodion query plan?

2016-03-08 Thread Liu, Ming (Ming)
Hi, all,

We have running some complex queries using Trafodion, and need to analyze the 
performance. One question is, if we want to know which part of the plan take 
longest time, is there any good tool/skills to answer this?

I can use 'get statistics for qid  default' to get runtime stats. But it 
is rather hard to interpret the output. I assume the "Oper CPU Time" is the 
best one we can trust? But I am not sure it is the pure CPU time, or it also 
include 'waiting time'? If I want to know the whole time an operation from 
start to end, is there any way?
And if it is CPU time, is it ns or something else, or just a relative number?

Here is an example output of 'get statistics'

LC   RC   Id   PaId ExId Frag TDB Name DOP Dispatches  
Oper CPU Time  Est. Records Used  Act. Records UsedDetails

12   .13   .70EX_ROOT  11   
  69  0  0 1945
11   .12   13   60EX_SPLIT_TOP 11   
  32 99,550,560  0
10   .11   12   60EX_SEND_TOP  10  32   
   1,844 99,550,560  0
9.10   11   62EX_SEND_BOTTOM   10  20   
 666 99,550,560  0
8.910   62EX_SPLIT_BOTTOM  10  40   
 411 99,550,560  0 53670501
678952EX_TUPLE_FLOW10  10   
  57 99,550,560  0
..7842EX_TRAF_LOAD_PREPARATION 10   0   
   0  1  0 TRAFODION.SEABASE.BLTEST|0|0
5.6832EX_SORT  10 316,410   
  40,033,167 99,550,560  0 0|15880|10
4.5622EX_SPLIT_TOP 10 316,411   
 559,691 99,550,560  5,690,184
3.4522EX_SEND_TOP  160474,849   
  13,076,509 99,550,560  5,690,196
2.3423EX_SEND_BOTTOM   160919,425   
  90,107,363 99,550,560  5,695,235
1.2323EX_SPLIT_BOTTOM  16  94,836   
   4,236,816 99,550,560  5,698,863 350792654
..1213EX_HDFS_SCAN 16  48,227   
 256,448,475  0  5,715,193 
HIVE.BLTEST|5715193|1664264993

Thanks in advance.

Thanks,
Ming



答复: Apache Trafodion At San Jose Strata + Hadoop World Developer Showcase!

2016-03-08 Thread Liu, Ming (Ming)
Great news and wish Trafodion can be known to more and more people!

发件人: Carol Pearson [mailto:carol.pearson...@gmail.com]
发送时间: 2016年3月9日 1:42
收件人: user@trafodion.incubator.apache.org
主题: Apache Trafodion At San Jose Strata + Hadoop World Developer Showcase!

Hi Trafodion Fans,

Great news if you're going to Strata + Hadoop World in San Jose at the end of 
March.  Apache Trafodion was selected to be part of the Developer Showcase on 
Wednesday, 30March!  Stop by to see Apache Trafodion in action and to talk to 
some of the people in the Trafodion community in person.

This is also a great opportunity to get some ideas on how you could join in on 
the Trafodion fun!

-Carol P.
---
Email:carol.pearson...@gmail.com
Twitter:  @CarolP222
---


Re: perl-DBD-SQLite*

2016-03-08 Thread Carol Pearson
Hi Amanda,

At one point, I know we used SQLite for some internal configuration
information, but I've lost track of whether or not we still do.  Otherwise,
SQLite would be needed for a dependency, and at that point and we'd have to
track that one down to see what's really needed.

If we don't install the full set, does the install complete and does
Trafodion start? No guarantees that we don't have a problem if it installs
and starts because the dependency could be later in the execution path.
But if install/start fails, at least that tells us that the dependency
matters and points us to at least one place *where* something cares.

-Carol P.

---
Email:carol.pearson...@gmail.com
Twitter:  @CarolP222
---

On Tue, Mar 8, 2016 at 2:55 PM, Amanda Moran  wrote:

> Hi there All-
>
> In the current installer we try to install this package: perl-DBD-SQLite*
> (note the *), on RHEL 6 and Centos 6 this has worked fine.
>
> I am testing the installer on RHEL 7.1 and it is not able to install 
> perl-DBD-SQLite*
> only perl-DBD-SQLite.
>
> Is just installing perl-DBD-SQLite going to be an issue?
>
> Thanks!
>
> --
> Thanks,
>
> Amanda Moran
>


Request for information about the installer's new management nodes prompt.

2016-03-08 Thread D. Markt
Hi,

  Some time back I noticed a new prompt as I was installing a Trafodion
build:

Do you have a set of management nodes (Y/N), default is N:

  I was wondering how to appropriately answer this prompt to use the new
feature as it was intended.  For example, one cluster is using the HA
configuration and has only the HBase Master and HDFS Name Node processes
running on two nodes.  Several questions come to mind:

  1) Is the expectation those nodes will be entered at this prompt?

  2) Will entering a node on this line preclude certain processes from
running on that node, for example:
   a) Are mxosrvrs still started on those nodes?
   b) Are other Trafodion processes started on those nodes?

  3) Are there other considerations as to which nodes should or should not
be listed as management nodes?

  Any insights will be helpful and appreciated.

Thanks,
Dennis




Apache Trafodion At San Jose Strata + Hadoop World Developer Showcase!

2016-03-08 Thread Carol Pearson
Hi Trafodion Fans,

Great news if you're going to Strata + Hadoop World in San Jose at the end
of March.  Apache Trafodion was selected to be part of the Developer
Showcase on Wednesday, 30March!  Stop by to see Apache Trafodion in action
and to talk to some of the people in the Trafodion community in person.

This is also a great opportunity to get some ideas on how you could join in
on the Trafodion fun!

-Carol P.
---
Email:carol.pearson...@gmail.com
Twitter:  @CarolP222
---