[jira] [Assigned] (TRAFODION-2595) Query result with cqd HBASE_DOP_PARALLEL_SCANNER ‘1.0’ not correct

2017-04-18 Thread Eric Owhadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/TRAFODION-2595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Owhadi reassigned TRAFODION-2595:
--

Assignee: Eric Owhadi

> Query result with cqd HBASE_DOP_PARALLEL_SCANNER ‘1.0’ not correct
> --
>
> Key: TRAFODION-2595
> URL: https://issues.apache.org/jira/browse/TRAFODION-2595
> Project: Apache Trafodion
>  Issue Type: Bug
>  Components: -exe
>Affects Versions: 2.1-incubating
>Reporter: Yuan Liu
>Assignee: Eric Owhadi
> Fix For: any
>
>
> With cqd HBASE_DOP_PARALLEL_SCANNER ‘1.0’, the query executed very fast, but 
> the result is not correct.
> Below is a query result of a test query,
> --Without cqd HBASE_DOP_PARALLEL_SCANNER ‘1.0’
> --- 12315 row(s) selected.
> Start Time 2017/04/18 13:15:52.946896
> End Time   2017/04/18 13:51:46.141087
> Elapsed Time  00:35:53.194191
> Compile Time  00:00:00.070804
> Execution Time00:35:53.123157
> --With cqd HBASE_DOP_PARALLEL_SCANNER ‘1.0’
> --- 3139 row(s) selected.
> Start Time 2017/04/18 11:03:19.265742
> End Time   2017/04/18 11:04:34.434705
> Elapsed Time  00:01:15.168963
> Compile Time  00:00:01.654184
> Execution Time00:01:13.514594



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (TRAFODION-2591) Creating INDEX on ADDED column gives corrupted results

2017-04-12 Thread Eric Owhadi (JIRA)
Eric Owhadi created TRAFODION-2591:
--

 Summary: Creating INDEX on ADDED column gives corrupted results
 Key: TRAFODION-2591
 URL: https://issues.apache.org/jira/browse/TRAFODION-2591
 Project: Apache Trafodion
  Issue Type: Bug
  Components: sql-exe
Affects Versions: 2.2-incubating
Reporter: Eric Owhadi



CREATE TABLE "car" (
":car_id" CHAR(27) CHARACTER SET ISO88591 NOT NULL PRIMARY KEY,
"a" CHAR(27) CHARACTER SET ISO88591 NOT NULL,
"b" DATE,
"c" VARCHAR(30),
"d" DATE,
"e" VARCHAR (30),
"f" CHAR(1) NOT NULL,
":brand" CHAR(1) CHARACTER SET ISO88591,
":horsePower" INT,
":featuresFlag" INT
);

UPSERT INTO TABLE "car" VALUES
('sdcABddOdsSk5DzWZxQFO2BkDLc','bbHABhqOdsSk5DwdEsQFO2BkDss',CURRENT_DATE,'me',CURRENT_DATE,'me',
 'A','a',122,7),
('wwsvchqOhmksh564ZxsdOSasdcx','bbHABhqOdsSk5DwdEsQFO2BkDss',CURRENT_DATE,'me',CURRENT_DATE,'me',
 'A','b',121,3),
('fcxdBhDlsdFc5DzWZaSFO2Bjh67','bbHABhqOdsSk5DwdEsQFO2BkDss',CURRENT_DATE,'me',CURRENT_DATE,'me',
 'A','c',133,2),
('ldsxRdqkKkSk5DzWZxQFOsd7shF','bbHABhqOdsSk5DwdEsQFO2BkDss',CURRENT_DATE,'me',CURRENT_DATE,'me',
 'A','d',201,1);

select ":car_id" from "car";
--data good

ALTER TABLE "car" ADD COLUMN "fk_owns_person" CHAR(27) CHARACTER SET ISO88591; 

select ":car_id" from "car";
-- data good

CREATE INDEX "car_idx_fk_owns_person" ON "car" ("fk_owns_person");

select ":car_id" from "car";
-- corrupted data



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (TRAFODION-2512) index access with MDAM not chosen where predicate is range spec

2017-02-28 Thread Eric Owhadi (JIRA)
Eric Owhadi created TRAFODION-2512:
--

 Summary: index access with MDAM not chosen where predicate is 
range spec
 Key: TRAFODION-2512
 URL: https://issues.apache.org/jira/browse/TRAFODION-2512
 Project: Apache Trafodion
  Issue Type: Bug
  Components: sql-cmp
Affects Versions: 2.2-incubating
Reporter: Eric Owhadi


create table tbl (
k1 int not null,
k2 int not null,
ts timestamp not null,
a char(10),
b varchar(30),
c largeint,
primary key (k1,k2,ts))
salt using 8 partitions 
division by (date_trunc('MONTH', ts)) ;

upsert using load into tbl
select num/1000, num, DATEADD(SECOND,-num,CURRENT_TIMESTAMP),cast(num as 
char(10)), cast(num as varchar(30)), num*1000
from (select 
1000*x1000+100*x100+10*x10+1*x1+1000*x1000+100*x100+10*x10+x1
 as num
  from (values (0)) seed(c)
  transpose 0,1,2,3,4,5,6,7,8,9 as x1
  transpose 0,1,2,3,4,5,6,7,8,9 as x10
  transpose 0,1,2,3,4,5,6,7,8,9 as x100
  transpose 0,1,2,3,4,5,6,7,8,9 as x1000
  transpose 0,1,2,3,4,5,6,7,8,9 as x1
  transpose 0,1,2,3,4,5,6,7,8,9 as x10  
  transpose 0,1,2,3,4,5,6,7,8,9 as x100
  transpose 0,1,2,3,4,5,6,7,8,9 as x1000
) T
;

create index tbl_idx_b on tbl(b) salt like table;

update statistics for table tbl on every column sample;

prepare s from select k1 where b = '1234567';

prepare ss from select k1 from b like '1234567%';

see how s is correctly picking index access.
see how ss, regardless of th elike correctly been transform into a range spec, 
end up doing a full main table scan instead of going after the index on b using 
MDAM and the range spec inside the mdam disjunct.

SQL>prepare s from select k1 from tbl where b = '1234567';

--- SQL command prepared.

SQL>explain options 'f' s;

 
LC   RC   OP   OPERATOR  OPT   DESCRIPTION   CARD   
         -
 
1.2root  1.00E+000
..1trafodion_index_scanIDX_TBL_B 1.00E+000

--- SQL operation complete.

SQL>explain s;

 
-- PLAN SUMMARY
MODULE_NAME .. DYNAMICALLY COMPILED
STATEMENT_NAME ... S
PLAN_ID .. 212355075543213868
ROWS_OUT . 1
EST_TOTAL_COST ... 0.15
STATEMENT  select k1 from tbl where b = '1234567'
 
 
-- NODE LISTING
ROOT ==  SEQ_NO 2ONLY CHILD 1
REQUESTS_IN .. 1
ROWS_OUT . 1
EST_OPER_COST  0
EST_TOTAL_COST ... 0.15
DESCRIPTION
  max_card_est ... 1
  fragment_id  0
  parent_frag  (none)
  fragment_type .. master
  statement_index  0
  affinity_value . 0
  max_max_cardinality  1
  total_overflow_size  0.00 KB
  xn_access_mode . read_only
  xn_autoabort_interval0
  auto_query_retry ... enabled
  plan_version ... 2,600
  embedded_arkcmp  used
  ObjectUIDs . 636255280475776270
  select_list  TRAFODION.ERIC.IDX_TBL_B.K1
  input_variables  %('1234567')
 
 
TRAFODION_INDEX_SCAN ==  SEQ_NO 1NO CHILDREN
TABLE_NAME ... TBL
REQUESTS_IN .. 1
ROWS_OUT . 1
EST_OPER_COST  0.15
EST_TOTAL_COST ... 0.15
DESCRIPTION
  max_card_est ... 1
  fragment_id  0
  parent_frag  (none)
  fragment_type .. master
  scan_type .. subset scan limited by mdam of index
 TRAFODION.ERIC.IDX_TBL_B(TRAFODION.ERIC.TBL)
  object_type  Trafodion
  cache_size ... 100
  probes . 1
  rows_accessed .. 1
  column_retrieved ... #1:1
  key_columns  TRAFODION.ERIC.IDX_TBL_B._SALT_,
 TRAFODION.ERIC.IDX_TBL_B.B,
 TRAFODION.ERIC.IDX_TBL_B._DIVISION_1_,
 TRAFODION.ERIC.IDX_TBL_B.K1,
 TRAFODION.ERIC.IDX_TBL_B.K2,
 TRAFODION.ERIC.IDX_TBL_B.TS
  mdam_disjunct .. (TRAFODION.ERIC.IDX_TBL_B.B = %('1234567'))

--- SQL operation complete.

SQL>prepare ss from select k1 from tbl where b like '1234567%';

--- SQL command prepared.

SQL>explain options 'f' ss;

 
LC   RC   OP   OPERATOR  OPT   DESCRIPTION   CARD   
         -
 
1.2root  6.25E+006
..1trafodion_index_scanIDX_TBL_B 6.2

[jira] [Created] (TRAFODION-2488) merge statement with where clause refuse to work in Batch mode

2017-02-16 Thread Eric Owhadi (JIRA)
Eric Owhadi created TRAFODION-2488:
--

 Summary: merge statement with where clause refuse to work in Batch 
mode
 Key: TRAFODION-2488
 URL: https://issues.apache.org/jira/browse/TRAFODION-2488
 Project: Apache Trafodion
  Issue Type: Bug
  Components: client-jdbc-t4, connectivity-general
Reporter: Eric Owhadi


merge into t on k1 = ? and k2 = ? 
when matched then update set (a,b,c,d) = (?,?,?,?) where c < ? 
when not matched then insert (k1,k2,a,b,c,d) values (?,?,?,?,?,?);

would refuse to run in batched mode using jdbc type 4 driver. error is:
*** ERROR[30019] Statement was compiled with scalar parameters and array values 
used during execution.

workaround:
merge into t on k1 = ? and k2 = ? 
when matched then update set (a,b,c,d) = (?,?,?,?) where c < TIMESTAMP 
'2014-01-27 17:11:10' 
when not matched then insert (k1,k2,a,b,c,d) values (?,?,?,?,?,?);

removing the parametric where clause resolve the problem if we can do so.

DLL:
CREATE TABLE t
  ( 
k1 CHAR(31) CHARACTER SET ISO88591 COLLATE
  DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE NOT SERIALIZED
  , k2 VARCHAR(256 CHARS) CHARACTER SET UTF8
  COLLATE DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE NOT SERIALIZED
  , aLARGEINT DEFAULT NULL NOT SERIALIZED
  , bVARCHAR(50) CHARACTER SET ISO88591 COLLATE
  DEFAULT DEFAULT NULL NOT SERIALIZED
  , c TIMESTAMP(6) NO DEFAULT NOT NULL NOT
  DROPPABLE NOT SERIALIZED
  , d  CHAR(1) CHARACTER SET ISO88591 COLLATE
  DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE NOT SERIALIZED
  , PRIMARY KEY (k1 ASC, k2 ASC)
  )
  SALT USING 8 PARTITIONS
   ON (k1)
 ATTRIBUTES ALIGNED FORMAT 
  HBASE_OPTIONS 
  ( 
DATA_BLOCK_ENCODING = 'FAST_DIFF',
COMPRESSION = 'SNAPPY' 
  ) 
;



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (TRAFODION-2464) failure to upsert into a table with an index

2017-01-25 Thread Eric Owhadi (JIRA)
Eric Owhadi created TRAFODION-2464:
--

 Summary: failure to upsert into a table with an index
 Key: TRAFODION-2464
 URL: https://issues.apache.org/jira/browse/TRAFODION-2464
 Project: Apache Trafodion
  Issue Type: Bug
  Components: sql-cmp
Affects Versions: 2.2-incubating
Reporter: Eric Owhadi
 Attachments: osim.tar

create table files
(
directory_id char(36) NOT NULL,
name varchar(256) NOT NULL,
fsize largeint,
owner varchar(50),
primary key (directory_id,name))
SALT USING 8 partitions on (directory_id)
HBASE_OPTIONS (DATA_BLOCK_ENCODING = 'FAST_DIFF', COMPRESSION = 'SNAPPY');

create index files_idx_by_directory_id on files(name,directory_id)
SALT LIKE TABLE 
HBASE_OPTIONS (DATA_BLOCK_ENCODING = 'FAST_DIFF', COMPRESSION = 'SNAPPY');

prepare s from upsert into files values (?,?,?,?);

*** ERROR[3241] This MERGE statement is not supported. Reason:  Non-unique ON 
clause not allowed with INSERT. [2017-01-20 00:38:54]
*** ERROR[8822] The statement was not prepared. [2017-01-20 00:38:54]





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TRAFODION-2422) populateSortCols was flagged as major perf offender during profiling

2017-01-02 Thread Eric Owhadi (JIRA)
Eric Owhadi created TRAFODION-2422:
--

 Summary: populateSortCols was flagged as major perf offender 
during profiling
 Key: TRAFODION-2422
 URL: https://issues.apache.org/jira/browse/TRAFODION-2422
 Project: Apache Trafodion
  Issue Type: Improvement
  Components: sql-cmp
Reporter: Eric Owhadi
Priority: Minor
 Fix For: 2.2-incubating


the reason  being an unbound iterative string search on huge string.
Added a minor fix: When not setting category in log conf file, extra code is 
executed, potentially impacting performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TRAFODION-2416) HBASE OPTIONS on CREATE INDEX have no effect

2016-12-21 Thread Eric Owhadi (JIRA)
Eric Owhadi created TRAFODION-2416:
--

 Summary: HBASE OPTIONS on CREATE INDEX have no effect
 Key: TRAFODION-2416
 URL: https://issues.apache.org/jira/browse/TRAFODION-2416
 Project: Apache Trafodion
  Issue Type: Bug
  Components: sql-exe
Affects Versions: 2.1-incubating, 2.2-incubating
Reporter: Eric Owhadi


create a table t,
then create an index with syntax like:

CREATE INDEX t_idx_by_a ON t
(a)
SALT LIKE TABLE
HBASE_OPTIONS (DATA_BLOCK_ENCODING = 'FAST_DIFF', COMPRESSION = 'SNAPPY');

then verify using hbase shell and notice that DATA_BLOCK_ENCODING and 
COMPRESSION are set to NONE :

describe "TRAFODION.MYSCHEMA.T_IDX_BY_A"
Table TRAFODION.MYSCHEMA.T_IDX_BY_A is ENABLED   
TRAFODION.MYSCHEMA.T_IDX_BY_A
COLUMN FAMILIES DESCRIPTION 
{NAME => '#1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP
_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION => 'NONE',
 TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '6553
6', REPLICATION_SCOPE => '0'}   
{NAME => 'mt_', BLOOMFILTER => 'ROW', VERSIONS => '2', IN_MEMORY => 'true', KEEP
_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION => 'NONE',
 TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '6553
6', REPLICATION_SCOPE => '0'}   
2 row(s) in 0.0220 seconds


workaround is to alter the table using hbase shell...




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TRAFODION-2415) wrong plan picked when using predicate on multiple columns of a multi columns INDEX

2016-12-21 Thread Eric Owhadi (JIRA)
Eric Owhadi created TRAFODION-2415:
--

 Summary: wrong plan picked when using predicate on multiple 
columns of a multi columns INDEX
 Key: TRAFODION-2415
 URL: https://issues.apache.org/jira/browse/TRAFODION-2415
 Project: Apache Trafodion
  Issue Type: Bug
  Components: sql-cmp
Reporter: Eric Owhadi


create table t(
a char(1) not null,
b char(1) not null,
c char(1) not null,
d char(1) not null,
e CHAR(1) NOT NULL,
f SMALLINT UNSIGNED NOT NULL,   
g SMALLINT UNSIGNED NOT NULL,   
h INT UNSIGNED NOT NULL,
customer CHAR(20) NOT NULL,
count INT UNSIGNED,
price LARGEINT,
PRIMARY KEY (a,b,c,d,e,f,g,h,customer)

)
SALT USING 4 PARTITIONS; 

CREATE INDEX t_idx_by_b ON t
(b,count,price);
CREATE INDEX t_idx_by_c ON t
(c,count,price);
CREATE INDEX t_idx_by_d ON t
(d,count,price);
CREATE INDEX t_idx_by_e ON t
(e,count,price);
CREATE INDEX t_idx_by_f ON t
(f,count,price);
CREATE INDEX t_idx_by_g ON t
(g,count,price);
CREATE INDEX t_idx_by_h ON t
(h,count,price);
CREATE INDEX t_idx_by_count ON t
(customer,count,price);


SELECT e, SUM(price)
FROM t
WHERE 
b IN ('1','2','3') 
AND 
f IN (10,20, 30)
GROUP BY 1;

generate wrong plan doing full scan on t_idx_by_f

while
SELECT e, SUM(price)
FROM t
WHERE 
f IN (10,20, 30)
GROUP BY 1;

generate good plan doing mdam on t_idx_by_f only.


using cqd rangespec_transformation 'off';
makes the problem go away.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TRAFODION-2413) row tatal length is wrongly calculated in Aligned format

2016-12-21 Thread Eric Owhadi (JIRA)
Eric Owhadi created TRAFODION-2413:
--

 Summary: row tatal length is wrongly calculated in Aligned format
 Key: TRAFODION-2413
 URL: https://issues.apache.org/jira/browse/TRAFODION-2413
 Project: Apache Trafodion
  Issue Type: Bug
  Components: sql-cmp
Affects Versions: 2.1-incubating, 2.2-incubating
Reporter: Eric Owhadi


Row_Total_Length as reported on db manager, or can be retrieved using following 
query is wrong for aligned format table. The logic used assumes wrongly that 
the table uses n aligned format. This has consequences beyond just information, 
but also for performance, internal buffers are wrongly over inflated, and BMO 
can spill to disk earlier than necessary:
set schema "_MD_";
-- Get the Object UID
select * from objects where schema_name = 'MYSCHEMA' and object_name = 
'MYTABLE';
select * from tables where table_uid = 3146444098927464648;
-- shows 630 as total row length
-- display columns
select column_name, column_number, sql_data_type, column_size
from columns
where object_uid = 3146444098927464648
order by 2;
-- shows same info as DBManager

TABLE_UID ROW_FORMAT  IS_AUDITED  ROW_DATA_LENGTH  ROW_TOTAL_LENGTH 
 KEY_LENGTH   NUM_SALT_PARTNS  FLAGS
  --  --  ---   
 ---  ---  

8556393311998525665  AF  Y51   630  
 374 0





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TRAFODION-2401) Creating Foreign Key constraint NOT ENFORCE still create system index

2016-12-16 Thread Eric Owhadi (JIRA)
Eric Owhadi created TRAFODION-2401:
--

 Summary: Creating Foreign Key constraint NOT ENFORCE still create 
system index
 Key: TRAFODION-2401
 URL: https://issues.apache.org/jira/browse/TRAFODION-2401
 Project: Apache Trafodion
  Issue Type: Improvement
  Components: sql-cmp
Affects Versions: 2.2-incubating
Reporter: Eric Owhadi


if you create a table and declare a foreign key not enforced, to help optimizer 
make good decision in plan selection, but still you elected to "not enforce" 
the FK checking to improve ingestion performance, trafodion is still creating a 
system index, and therefore impact ingest performance. The index is never used, 
as the FK is not enforced, but we are still paying the cost of index maintenance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (TRAFODION-1421) Implement parallel Scanner primitive

2016-07-06 Thread Eric Owhadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/TRAFODION-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Owhadi closed TRAFODION-1421.
--

> Implement parallel Scanner primitive
> 
>
> Key: TRAFODION-1421
> URL: https://issues.apache.org/jira/browse/TRAFODION-1421
> Project: Apache Trafodion
>  Issue Type: Improvement
>  Components: sql-cmp, sql-exe
>Reporter: Eric Owhadi
>Assignee: Eric Owhadi
>  Labels: performance
> Fix For: 2.1-incubating
>
>
> ClientScanner API is serial, to conserve key ordering. However, many 
> operators don't care about ordering and would rather get the scan result 
> fast, regardless of order. This JIRA is about providing a parallel scanner, 
> that would take care of splitting the work between all region servers evenly 
> if possible. HBase had a parallel scanner in the pipe for quite some time 
> HBAse-9272, but the work is stalled since october 2013. However, looking at 
> the available code, look like a big part can be leveraged without requiring 
> an HBase custom build. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TRAFODION-1421) Implement parallel Scanner primitive

2016-07-06 Thread Eric Owhadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/TRAFODION-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Owhadi resolved TRAFODION-1421.

Resolution: Fixed

> Implement parallel Scanner primitive
> 
>
> Key: TRAFODION-1421
> URL: https://issues.apache.org/jira/browse/TRAFODION-1421
> Project: Apache Trafodion
>  Issue Type: Improvement
>  Components: sql-cmp, sql-exe
>Reporter: Eric Owhadi
>Assignee: Eric Owhadi
>  Labels: performance
> Fix For: 2.1-incubating
>
>
> ClientScanner API is serial, to conserve key ordering. However, many 
> operators don't care about ordering and would rather get the scan result 
> fast, regardless of order. This JIRA is about providing a parallel scanner, 
> that would take care of splitting the work between all region servers evenly 
> if possible. HBase had a parallel scanner in the pipe for quite some time 
> HBAse-9272, but the work is stalled since october 2013. However, looking at 
> the available code, look like a big part can be leveraged without requiring 
> an HBase custom build. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (TRAFODION-2009) parallel scanner failing on tpcds.store_sales table

2016-06-01 Thread Eric Owhadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/TRAFODION-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Owhadi closed TRAFODION-2009.
--

> parallel scanner failing on tpcds.store_sales table
> ---
>
> Key: TRAFODION-2009
> URL: https://issues.apache.org/jira/browse/TRAFODION-2009
> Project: Apache Trafodion
>  Issue Type: Bug
>  Components: sql-exe
>Affects Versions: 2.1-incubating
> Environment: on dev workstation
>Reporter: Eric Owhadi
>Assignee: Eric Owhadi
>Priority: Minor
> Fix For: 2.1-incubating
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> CREATE TABLE TRAFODION.SEABASE.STORE_SALES
>   (
> SS_SOLD_DATE_SK  INT DEFAULT NULL NOT SERIALIZED
>   , SS_ITEM_SK   INT NO DEFAULT NOT NULL NOT DROPPABLE NOT
>   SERIALIZED
>   , SS_TICKET_NUMBER INT NO DEFAULT NOT NULL NOT DROPPABLE NOT
>   SERIALIZED
>   , SS_SOLD_TIME_SK  INT DEFAULT NULL NOT SERIALIZED
>   , SS_CUSTOMER_SK   INT DEFAULT NULL NOT SERIALIZED
>   , SS_CDEMO_SK  INT DEFAULT NULL NOT SERIALIZED
>   , SS_HDEMO_SK  INT DEFAULT NULL NOT SERIALIZED
>   , SS_ADDR_SK   INT DEFAULT NULL NOT SERIALIZED
>   , SS_STORE_SK  INT DEFAULT NULL NOT SERIALIZED
>   , SS_PROMO_SK  INT DEFAULT NULL NOT SERIALIZED
>   , SS_QUANTITY  INT DEFAULT NULL NOT SERIALIZED
>   , SS_WHOLESALE_COSTREAL DEFAULT NULL NOT SERIALIZED
>   , SS_LIST_PRICEREAL DEFAULT NULL NOT SERIALIZED
>   , SS_SALES_PRICE   REAL DEFAULT NULL NOT SERIALIZED
>   , SS_EXT_DISCOUNT_AMT  REAL DEFAULT NULL NOT SERIALIZED
>   , SS_EXT_SALES_PRICE   REAL DEFAULT NULL NOT SERIALIZED
>   , SS_EXT_WHOLESALE_COSTREAL DEFAULT NULL NOT SERIALIZED
>   , SS_EXT_LIST_PRICEREAL DEFAULT NULL NOT SERIALIZED
>   , SS_EXT_TAX   REAL DEFAULT NULL NOT SERIALIZED
>   , SS_COUPON_AMTREAL DEFAULT NULL NOT SERIALIZED
>   , SS_NET_PAID  REAL DEFAULT NULL NOT SERIALIZED
>   , SS_NET_PAID_INC_TAX  REAL DEFAULT NULL NOT SERIALIZED
>   , SS_NET_PROFITREAL DEFAULT NULL NOT SERIALIZED
>   , PRIMARY KEY (SS_SOLD_DATE_SK ASC, SS_ITEM_SK ASC, SS_TICKET_NUMBER ASC)
>   )
>   SALT USING 8 PARTITIONS
>ON (SS_ITEM_SK, SS_TICKET_NUMBER)
>  ATTRIBUTES ALIGNED FORMAT
>   HBASE_OPTIONS
>   (
> DATA_BLOCK_ENCODING = 'FAST_DIFF',
> BLOCKSIZE = '131072'
>   )
> ;
> load into store_sales select
>SS_SOLD_DATE_SK 
>   , SS_ITEM_SK   
>   , SS_TICKET_NUMBER
>   , SS_SOLD_TIME_SK
>   , SS_CUSTOMER_SK 
>   , SS_CDEMO_SK  
>   , SS_HDEMO_SK
>   , SS_ADDR_SK
>   , SS_STORE_SK   
>   , SS_PROMO_SK  
>   , SS_QUANTITY 
>   , SS_WHOLESALE_COST  
>   , SS_LIST_PRICE   
>   , SS_SALES_PRICE  
>   , SS_EXT_DISCOUNT_AMT  
>   , SS_EXT_SALES_PRICE  
>   , SS_EXT_WHOLESALE_COST   
>   , SS_EXT_LIST_PRICE
>   , SS_EXT_TAX   
>   , SS_COUPON_AMT
>   , SS_NET_PAID  
>   , SS_NET_PAID_INC_TAX  
>   , SS_NET_PROFIT 
> from hive.hive.store_sales;
> set statistics on;
> cqd parallel_num_esps '1';
> cqd hbase_dop_parallel_scanner '1.0';
> prepare xx from select count(*) from store_sales where ss_customer_sk between 
> 1000 and 2;
> execute xx;
> the result will return wrong count.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TRAFODION-2009) parallel scanner failing on tpcds.store_sales table

2016-06-01 Thread Eric Owhadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/TRAFODION-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Owhadi resolved TRAFODION-2009.

   Resolution: Fixed
Fix Version/s: 2.1-incubating

> parallel scanner failing on tpcds.store_sales table
> ---
>
> Key: TRAFODION-2009
> URL: https://issues.apache.org/jira/browse/TRAFODION-2009
> Project: Apache Trafodion
>  Issue Type: Bug
>  Components: sql-exe
>Affects Versions: 2.1-incubating
> Environment: on dev workstation
>Reporter: Eric Owhadi
>Assignee: Eric Owhadi
>Priority: Minor
> Fix For: 2.1-incubating
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> CREATE TABLE TRAFODION.SEABASE.STORE_SALES
>   (
> SS_SOLD_DATE_SK  INT DEFAULT NULL NOT SERIALIZED
>   , SS_ITEM_SK   INT NO DEFAULT NOT NULL NOT DROPPABLE NOT
>   SERIALIZED
>   , SS_TICKET_NUMBER INT NO DEFAULT NOT NULL NOT DROPPABLE NOT
>   SERIALIZED
>   , SS_SOLD_TIME_SK  INT DEFAULT NULL NOT SERIALIZED
>   , SS_CUSTOMER_SK   INT DEFAULT NULL NOT SERIALIZED
>   , SS_CDEMO_SK  INT DEFAULT NULL NOT SERIALIZED
>   , SS_HDEMO_SK  INT DEFAULT NULL NOT SERIALIZED
>   , SS_ADDR_SK   INT DEFAULT NULL NOT SERIALIZED
>   , SS_STORE_SK  INT DEFAULT NULL NOT SERIALIZED
>   , SS_PROMO_SK  INT DEFAULT NULL NOT SERIALIZED
>   , SS_QUANTITY  INT DEFAULT NULL NOT SERIALIZED
>   , SS_WHOLESALE_COSTREAL DEFAULT NULL NOT SERIALIZED
>   , SS_LIST_PRICEREAL DEFAULT NULL NOT SERIALIZED
>   , SS_SALES_PRICE   REAL DEFAULT NULL NOT SERIALIZED
>   , SS_EXT_DISCOUNT_AMT  REAL DEFAULT NULL NOT SERIALIZED
>   , SS_EXT_SALES_PRICE   REAL DEFAULT NULL NOT SERIALIZED
>   , SS_EXT_WHOLESALE_COSTREAL DEFAULT NULL NOT SERIALIZED
>   , SS_EXT_LIST_PRICEREAL DEFAULT NULL NOT SERIALIZED
>   , SS_EXT_TAX   REAL DEFAULT NULL NOT SERIALIZED
>   , SS_COUPON_AMTREAL DEFAULT NULL NOT SERIALIZED
>   , SS_NET_PAID  REAL DEFAULT NULL NOT SERIALIZED
>   , SS_NET_PAID_INC_TAX  REAL DEFAULT NULL NOT SERIALIZED
>   , SS_NET_PROFITREAL DEFAULT NULL NOT SERIALIZED
>   , PRIMARY KEY (SS_SOLD_DATE_SK ASC, SS_ITEM_SK ASC, SS_TICKET_NUMBER ASC)
>   )
>   SALT USING 8 PARTITIONS
>ON (SS_ITEM_SK, SS_TICKET_NUMBER)
>  ATTRIBUTES ALIGNED FORMAT
>   HBASE_OPTIONS
>   (
> DATA_BLOCK_ENCODING = 'FAST_DIFF',
> BLOCKSIZE = '131072'
>   )
> ;
> load into store_sales select
>SS_SOLD_DATE_SK 
>   , SS_ITEM_SK   
>   , SS_TICKET_NUMBER
>   , SS_SOLD_TIME_SK
>   , SS_CUSTOMER_SK 
>   , SS_CDEMO_SK  
>   , SS_HDEMO_SK
>   , SS_ADDR_SK
>   , SS_STORE_SK   
>   , SS_PROMO_SK  
>   , SS_QUANTITY 
>   , SS_WHOLESALE_COST  
>   , SS_LIST_PRICE   
>   , SS_SALES_PRICE  
>   , SS_EXT_DISCOUNT_AMT  
>   , SS_EXT_SALES_PRICE  
>   , SS_EXT_WHOLESALE_COST   
>   , SS_EXT_LIST_PRICE
>   , SS_EXT_TAX   
>   , SS_COUPON_AMT
>   , SS_NET_PAID  
>   , SS_NET_PAID_INC_TAX  
>   , SS_NET_PROFIT 
> from hive.hive.store_sales;
> set statistics on;
> cqd parallel_num_esps '1';
> cqd hbase_dop_parallel_scanner '1.0';
> prepare xx from select count(*) from store_sales where ss_customer_sk between 
> 1000 and 2;
> execute xx;
> the result will return wrong count.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TRAFODION-2009) parallel scanner failing on tpcds.store_sales table

2016-05-23 Thread Eric Owhadi (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297190#comment-15297190
 ] 

Eric Owhadi commented on TRAFODION-2009:


set severity to minor because parallel_scanner is experimental feature disabled 
by default.

> parallel scanner failing on tpcds.store_sales table
> ---
>
> Key: TRAFODION-2009
> URL: https://issues.apache.org/jira/browse/TRAFODION-2009
> Project: Apache Trafodion
>  Issue Type: Bug
>  Components: sql-exe
>Affects Versions: 2.1-incubating
> Environment: on dev workstation
>Reporter: Eric Owhadi
>Assignee: Eric Owhadi
>Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> CREATE TABLE TRAFODION.SEABASE.STORE_SALES
>   (
> SS_SOLD_DATE_SK  INT DEFAULT NULL NOT SERIALIZED
>   , SS_ITEM_SK   INT NO DEFAULT NOT NULL NOT DROPPABLE NOT
>   SERIALIZED
>   , SS_TICKET_NUMBER INT NO DEFAULT NOT NULL NOT DROPPABLE NOT
>   SERIALIZED
>   , SS_SOLD_TIME_SK  INT DEFAULT NULL NOT SERIALIZED
>   , SS_CUSTOMER_SK   INT DEFAULT NULL NOT SERIALIZED
>   , SS_CDEMO_SK  INT DEFAULT NULL NOT SERIALIZED
>   , SS_HDEMO_SK  INT DEFAULT NULL NOT SERIALIZED
>   , SS_ADDR_SK   INT DEFAULT NULL NOT SERIALIZED
>   , SS_STORE_SK  INT DEFAULT NULL NOT SERIALIZED
>   , SS_PROMO_SK  INT DEFAULT NULL NOT SERIALIZED
>   , SS_QUANTITY  INT DEFAULT NULL NOT SERIALIZED
>   , SS_WHOLESALE_COSTREAL DEFAULT NULL NOT SERIALIZED
>   , SS_LIST_PRICEREAL DEFAULT NULL NOT SERIALIZED
>   , SS_SALES_PRICE   REAL DEFAULT NULL NOT SERIALIZED
>   , SS_EXT_DISCOUNT_AMT  REAL DEFAULT NULL NOT SERIALIZED
>   , SS_EXT_SALES_PRICE   REAL DEFAULT NULL NOT SERIALIZED
>   , SS_EXT_WHOLESALE_COSTREAL DEFAULT NULL NOT SERIALIZED
>   , SS_EXT_LIST_PRICEREAL DEFAULT NULL NOT SERIALIZED
>   , SS_EXT_TAX   REAL DEFAULT NULL NOT SERIALIZED
>   , SS_COUPON_AMTREAL DEFAULT NULL NOT SERIALIZED
>   , SS_NET_PAID  REAL DEFAULT NULL NOT SERIALIZED
>   , SS_NET_PAID_INC_TAX  REAL DEFAULT NULL NOT SERIALIZED
>   , SS_NET_PROFITREAL DEFAULT NULL NOT SERIALIZED
>   , PRIMARY KEY (SS_SOLD_DATE_SK ASC, SS_ITEM_SK ASC, SS_TICKET_NUMBER ASC)
>   )
>   SALT USING 8 PARTITIONS
>ON (SS_ITEM_SK, SS_TICKET_NUMBER)
>  ATTRIBUTES ALIGNED FORMAT
>   HBASE_OPTIONS
>   (
> DATA_BLOCK_ENCODING = 'FAST_DIFF',
> BLOCKSIZE = '131072'
>   )
> ;
> load into store_sales select
>SS_SOLD_DATE_SK 
>   , SS_ITEM_SK   
>   , SS_TICKET_NUMBER
>   , SS_SOLD_TIME_SK
>   , SS_CUSTOMER_SK 
>   , SS_CDEMO_SK  
>   , SS_HDEMO_SK
>   , SS_ADDR_SK
>   , SS_STORE_SK   
>   , SS_PROMO_SK  
>   , SS_QUANTITY 
>   , SS_WHOLESALE_COST  
>   , SS_LIST_PRICE   
>   , SS_SALES_PRICE  
>   , SS_EXT_DISCOUNT_AMT  
>   , SS_EXT_SALES_PRICE  
>   , SS_EXT_WHOLESALE_COST   
>   , SS_EXT_LIST_PRICE
>   , SS_EXT_TAX   
>   , SS_COUPON_AMT
>   , SS_NET_PAID  
>   , SS_NET_PAID_INC_TAX  
>   , SS_NET_PROFIT 
> from hive.hive.store_sales;
> set statistics on;
> cqd parallel_num_esps '1';
> cqd hbase_dop_parallel_scanner '1.0';
> prepare xx from select count(*) from store_sales where ss_customer_sk between 
> 1000 and 2;
> execute xx;
> the result will return wrong count.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TRAFODION-2009) parallel scanner failing on tpcds.store_sales table

2016-05-23 Thread Eric Owhadi (JIRA)
Eric Owhadi created TRAFODION-2009:
--

 Summary: parallel scanner failing on tpcds.store_sales table
 Key: TRAFODION-2009
 URL: https://issues.apache.org/jira/browse/TRAFODION-2009
 Project: Apache Trafodion
  Issue Type: Bug
  Components: sql-exe
Affects Versions: 2.1-incubating
 Environment: on dev workstation
Reporter: Eric Owhadi
Assignee: Eric Owhadi
Priority: Minor


CREATE TABLE TRAFODION.SEABASE.STORE_SALES
  (
SS_SOLD_DATE_SK  INT DEFAULT NULL NOT SERIALIZED
  , SS_ITEM_SK   INT NO DEFAULT NOT NULL NOT DROPPABLE NOT
  SERIALIZED
  , SS_TICKET_NUMBER INT NO DEFAULT NOT NULL NOT DROPPABLE NOT
  SERIALIZED
  , SS_SOLD_TIME_SK  INT DEFAULT NULL NOT SERIALIZED
  , SS_CUSTOMER_SK   INT DEFAULT NULL NOT SERIALIZED
  , SS_CDEMO_SK  INT DEFAULT NULL NOT SERIALIZED
  , SS_HDEMO_SK  INT DEFAULT NULL NOT SERIALIZED
  , SS_ADDR_SK   INT DEFAULT NULL NOT SERIALIZED
  , SS_STORE_SK  INT DEFAULT NULL NOT SERIALIZED
  , SS_PROMO_SK  INT DEFAULT NULL NOT SERIALIZED
  , SS_QUANTITY  INT DEFAULT NULL NOT SERIALIZED
  , SS_WHOLESALE_COSTREAL DEFAULT NULL NOT SERIALIZED
  , SS_LIST_PRICEREAL DEFAULT NULL NOT SERIALIZED
  , SS_SALES_PRICE   REAL DEFAULT NULL NOT SERIALIZED
  , SS_EXT_DISCOUNT_AMT  REAL DEFAULT NULL NOT SERIALIZED
  , SS_EXT_SALES_PRICE   REAL DEFAULT NULL NOT SERIALIZED
  , SS_EXT_WHOLESALE_COSTREAL DEFAULT NULL NOT SERIALIZED
  , SS_EXT_LIST_PRICEREAL DEFAULT NULL NOT SERIALIZED
  , SS_EXT_TAX   REAL DEFAULT NULL NOT SERIALIZED
  , SS_COUPON_AMTREAL DEFAULT NULL NOT SERIALIZED
  , SS_NET_PAID  REAL DEFAULT NULL NOT SERIALIZED
  , SS_NET_PAID_INC_TAX  REAL DEFAULT NULL NOT SERIALIZED
  , SS_NET_PROFITREAL DEFAULT NULL NOT SERIALIZED
  , PRIMARY KEY (SS_SOLD_DATE_SK ASC, SS_ITEM_SK ASC, SS_TICKET_NUMBER ASC)
  )
  SALT USING 8 PARTITIONS
   ON (SS_ITEM_SK, SS_TICKET_NUMBER)
 ATTRIBUTES ALIGNED FORMAT
  HBASE_OPTIONS
  (
DATA_BLOCK_ENCODING = 'FAST_DIFF',
BLOCKSIZE = '131072'
  )
;

load into store_sales select
   SS_SOLD_DATE_SK 
  , SS_ITEM_SK   
  , SS_TICKET_NUMBER
  , SS_SOLD_TIME_SK
  , SS_CUSTOMER_SK 
  , SS_CDEMO_SK  
  , SS_HDEMO_SK
  , SS_ADDR_SK
  , SS_STORE_SK   
  , SS_PROMO_SK  
  , SS_QUANTITY 
  , SS_WHOLESALE_COST  
  , SS_LIST_PRICE   
  , SS_SALES_PRICE  
  , SS_EXT_DISCOUNT_AMT  
  , SS_EXT_SALES_PRICE  
  , SS_EXT_WHOLESALE_COST   
  , SS_EXT_LIST_PRICE
  , SS_EXT_TAX   
  , SS_COUPON_AMT
  , SS_NET_PAID  
  , SS_NET_PAID_INC_TAX  
  , SS_NET_PROFIT 
from hive.hive.store_sales;

set statistics on;
cqd parallel_num_esps '1';
cqd hbase_dop_parallel_scanner '1.0';
prepare xx from select count(*) from store_sales where ss_customer_sk between 
1000 and 2;
execute xx;

the result will return wrong count.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TRAFODION-1914) optimize "added columns" in indexes

2016-05-18 Thread Eric Owhadi (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-1914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15289108#comment-15289108
 ] 

Eric Owhadi commented on TRAFODION-1914:


Hi Ming,
No it is different thing. In my JIRA, there is no limitation of it being
only for unique index.
Hope that makes sense,
Eric



> optimize "added columns" in indexes
> ---
>
> Key: TRAFODION-1914
> URL: https://issues.apache.org/jira/browse/TRAFODION-1914
> Project: Apache Trafodion
>  Issue Type: Improvement
>  Components: sql-cmp
>Reporter: Eric Owhadi
>
> the current CREATE INDEX feature will always put each column added to the 
> index in the clustering key. But sometimes, users just want to add columns to 
> the index to avoid having to probe back the primary table to fetch just one 
> or 2 column back. Instead copying these columns in the index can avoid making 
> a probe back to main table and therefore improve performance. Current 
> implementation allows this, but will always put the extra column as part of 
> the clustering key. That is not optimal, and very bad for the case of 
> VARCHAR, since they are exploded to there max size when part of the 
> clustering key. So this JIRA is abount altering the syntax of create index, 
> and flag columns that are added but should not be part of the clustering key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TRAFODION-1914) optimize "added columns" in indexes

2016-03-30 Thread Eric Owhadi (JIRA)
Eric Owhadi created TRAFODION-1914:
--

 Summary: optimize "added columns" in indexes
 Key: TRAFODION-1914
 URL: https://issues.apache.org/jira/browse/TRAFODION-1914
 Project: Apache Trafodion
  Issue Type: Improvement
  Components: sql-cmp
Reporter: Eric Owhadi


the current CREATE INDEX feature will always put each column added to the index 
in the clustering key. But sometimes, users just want to add columns to the 
index to avoid having to probe back the primary table to fetch just one or 2 
column back. Instead copying these columns in the index can avoid making a 
probe back to main table and therefore improve performance. Current 
implementation allows this, but will always put the extra column as part of the 
clustering key. That is not optimal, and very bad for the case of VARCHAR, 
since they are exploded to there max size when part of the clustering key. So 
this JIRA is abount altering the syntax of create index, and flag columns that 
are added but should not be part of the clustering key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (TRAFODION-1900) Optimize MDAM scans with small scanner

2016-03-22 Thread Eric Owhadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/TRAFODION-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Owhadi closed TRAFODION-1900.
--

> Optimize MDAM scans with small scanner
> --
>
> Key: TRAFODION-1900
> URL: https://issues.apache.org/jira/browse/TRAFODION-1900
> Project: Apache Trafodion
>  Issue Type: Improvement
>  Components: sql-exe
>Reporter: Eric Owhadi
>Assignee: Eric Owhadi
> Fix For: 2.0-incubating
>
>
> When doing MDAM scans, we are performing interlaced scan for PROBE and for 
> real scan. The probes always return only 1 row, then we close the scanner 
> immediately, therefore should use always small scanner. I will make it 
> conditional on the existing CQD HBASE_SMALL_SCANNER (ether SYSTEM or ON). In 
> addition, caching of blocks retrieved by probe should always we at least 
> receiving one succesfull cache hit on the next MDAM scan, therefore forcing 
> caching ON for MDAM prob is a good idea. Again, will make this forcing 
> conditional on HBASE_SMALL_SCANNER SYSTEM or ON.
> Then for the real scan part of MDAM, I will use the following heuristic: If 
> previous scan fitted in one hbase block, then it is likelly than next will 
> also fit in one hbase block, therefore enable small scanner for next scan. 
> Again all this only if CQD above is ON or SYSTEM.
> Results of using small scanner on MDAM when it make sense showed a 1.39X 
> speed improvement...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TRAFODION-1900) Optimize MDAM scans with small scanner

2016-03-22 Thread Eric Owhadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/TRAFODION-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Owhadi resolved TRAFODION-1900.

Resolution: Fixed

f

> Optimize MDAM scans with small scanner
> --
>
> Key: TRAFODION-1900
> URL: https://issues.apache.org/jira/browse/TRAFODION-1900
> Project: Apache Trafodion
>  Issue Type: Improvement
>  Components: sql-exe
>Reporter: Eric Owhadi
>Assignee: Eric Owhadi
> Fix For: 2.0-incubating
>
>
> When doing MDAM scans, we are performing interlaced scan for PROBE and for 
> real scan. The probes always return only 1 row, then we close the scanner 
> immediately, therefore should use always small scanner. I will make it 
> conditional on the existing CQD HBASE_SMALL_SCANNER (ether SYSTEM or ON). In 
> addition, caching of blocks retrieved by probe should always we at least 
> receiving one succesfull cache hit on the next MDAM scan, therefore forcing 
> caching ON for MDAM prob is a good idea. Again, will make this forcing 
> conditional on HBASE_SMALL_SCANNER SYSTEM or ON.
> Then for the real scan part of MDAM, I will use the following heuristic: If 
> previous scan fitted in one hbase block, then it is likelly than next will 
> also fit in one hbase block, therefore enable small scanner for next scan. 
> Again all this only if CQD above is ON or SYSTEM.
> Results of using small scanner on MDAM when it make sense showed a 1.39X 
> speed improvement...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TRAFODION-1900) Optimize MDAM scans with small scanner

2016-03-22 Thread Eric Owhadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/TRAFODION-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Owhadi updated TRAFODION-1900:
---
Fix Version/s: 2.0-incubating

> Optimize MDAM scans with small scanner
> --
>
> Key: TRAFODION-1900
> URL: https://issues.apache.org/jira/browse/TRAFODION-1900
> Project: Apache Trafodion
>  Issue Type: Improvement
>  Components: sql-exe
>Reporter: Eric Owhadi
>Assignee: Eric Owhadi
> Fix For: 2.0-incubating
>
>
> When doing MDAM scans, we are performing interlaced scan for PROBE and for 
> real scan. The probes always return only 1 row, then we close the scanner 
> immediately, therefore should use always small scanner. I will make it 
> conditional on the existing CQD HBASE_SMALL_SCANNER (ether SYSTEM or ON). In 
> addition, caching of blocks retrieved by probe should always we at least 
> receiving one succesfull cache hit on the next MDAM scan, therefore forcing 
> caching ON for MDAM prob is a good idea. Again, will make this forcing 
> conditional on HBASE_SMALL_SCANNER SYSTEM or ON.
> Then for the real scan part of MDAM, I will use the following heuristic: If 
> previous scan fitted in one hbase block, then it is likelly than next will 
> also fit in one hbase block, therefore enable small scanner for next scan. 
> Again all this only if CQD above is ON or SYSTEM.
> Results of using small scanner on MDAM when it make sense showed a 1.39X 
> speed improvement...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TRAFODION-1900) Optimize MDAM scans with small scanner

2016-03-19 Thread Eric Owhadi (JIRA)
Eric Owhadi created TRAFODION-1900:
--

 Summary: Optimize MDAM scans with small scanner
 Key: TRAFODION-1900
 URL: https://issues.apache.org/jira/browse/TRAFODION-1900
 Project: Apache Trafodion
  Issue Type: Improvement
  Components: sql-exe
Reporter: Eric Owhadi
Assignee: Eric Owhadi


When doing MDAM scans, we are performing interlaced scan for PROBE and for real 
scan. The probes always return only 1 row, then we close the scanner 
immediately, therefore should use always small scanner. I will make it 
conditional on the existing CQD HBASE_SMALL_SCANNER (ether SYSTEM or ON). In 
addition, caching of blocks retrieved by probe should always we at least 
receiving one succesfull cache hit on the next MDAM scan, therefore forcing 
caching ON for MDAM prob is a good idea. Again, will make this forcing 
conditional on HBASE_SMALL_SCANNER SYSTEM or ON.

Then for the real scan part of MDAM, I will use the following heuristic: If 
previous scan fitted in one hbase block, then it is likelly than next will also 
fit in one hbase block, therefore enable small scanner for next scan. Again all 
this only if CQD above is ON or SYSTEM.

Results of using small scanner on MDAM when it make sense showed a 1.39X speed 
improvement...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TRAFODION-1863) With hbase_filter_preds set to '2', wrong results are returned for a specific use case.

2016-03-11 Thread Eric Owhadi (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15191242#comment-15191242
 ] 

Eric Owhadi commented on TRAFODION-1863:


this closes JIRA 1863

> With hbase_filter_preds set to '2', wrong results are returned for a specific 
> use case.
> ---
>
> Key: TRAFODION-1863
> URL: https://issues.apache.org/jira/browse/TRAFODION-1863
> Project: Apache Trafodion
>  Issue Type: Bug
>Reporter: Selvaganesan Govindarajan
>Assignee: Eric Owhadi
>
> create table t056t57 (a1 numeric(2,2) signed default 0 not null);
> showddl t056t57;
> insert into t056t57 default values;
> select * from t056t57;
> >>select * from t056t57 ;
> A1 
> ---
> .00
> --- 1 row(s) selected.
> >>cqd hbase_filter_preds '2' ;
> --- SQL operation complete.
> >>select * from t056t57 ;
> ..
> --- 0 row(s) selected.
> >>
> This was causing core/TEST056 to fail with PR #340.. Possibly similar issue 
> is with core/TEST029 too,  Currently this test case runs with 
> hbase_filter_preds set to 'ON'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TRAFODION-1863) With hbase_filter_preds set to '2', wrong results are returned for a specific use case.

2016-03-07 Thread Eric Owhadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/TRAFODION-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Owhadi updated TRAFODION-1863:
---

fix with PR364

> With hbase_filter_preds set to '2', wrong results are returned for a specific 
> use case.
> ---
>
> Key: TRAFODION-1863
> URL: https://issues.apache.org/jira/browse/TRAFODION-1863
> Project: Apache Trafodion
>  Issue Type: Bug
>Reporter: Selvaganesan Govindarajan
>Assignee: Eric Owhadi
>
> create table t056t57 (a1 numeric(2,2) signed default 0 not null);
> showddl t056t57;
> insert into t056t57 default values;
> select * from t056t57;
> >>select * from t056t57 ;
> A1 
> ---
> .00
> --- 1 row(s) selected.
> >>cqd hbase_filter_preds '2' ;
> --- SQL operation complete.
> >>select * from t056t57 ;
> ..
> --- 0 row(s) selected.
> >>
> This was causing core/TEST056 to fail with PR #340.. Possibly similar issue 
> is with core/TEST029 too,  Currently this test case runs with 
> hbase_filter_preds set to 'ON'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (TRAFODION-1877) CORE/TEST131 failed when run in standalone

2016-03-04 Thread Eric Owhadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/TRAFODION-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Owhadi closed TRAFODION-1877.
--
Resolution: Not A Problem

was the dependency on first running CORE/TEST000 to create the prerequisits

> CORE/TEST131 failed when run in standalone
> --
>
> Key: TRAFODION-1877
> URL: https://issues.apache.org/jira/browse/TRAFODION-1877
> Project: Apache Trafodion
>  Issue Type: Bug
> Environment: trafodion master pulled on March 4th 2016
>Reporter: Eric Owhadi
>
> $scriptsdir/core/runregr -sb TEST131
> getting this result:
> 18c18,20
> < --- SQL operation complete.
> ---
> > *** ERROR[1390] Object TRAFODION.TRAFODION.T131A already exists in 
> > TRAFODION.
> > 
> > --- SQL operation failed with errors.
> 30c32,34
> < --- SQL operation complete.
> ---
> > *** ERROR[1390] Object TRAFODION.TRAFODION.T131B already exists in 
> > TRAFODION.
> > 
> > --- SQL operation failed with errors.
> 42c46,48
> < --- SQL operation complete.
> ---
> > *** ERROR[1390] Object TRAFODION.TRAFODION.T131C already exists in 
> > TRAFODION.
> > 
> > --- SQL operation failed with errors.
> 46c52,54
> < --- SQL operation complete.
> ---
> > *** ERROR[1222] Command not supported when authorization is not enabled.
> > 
> > --- SQL operation failed with errors.
> 49c57,59
> < --- SQL operation complete.
> ---
> > *** ERROR[1222] Command not supported when authorization is not enabled.
> > 
> > --- SQL operation failed with errors.
> 52c62,64
> < --- SQL operation complete.
> ---
> > *** ERROR[1222] Command not supported when authorization is not enabled.
> > 
> > --- SQL operation failed with errors.
> 71d82
> <  Query_Invalidation_Keys {
> 79c90,92
> < --- SQL operation complete.
> ---
> > *** ERROR[1222] Command not supported when authorization is not enabled.
> > 
> > --- SQL operation failed with errors.
> 84c97,99
> < --- SQL operation complete.
> ---
> > *** ERROR[1222] Command not supported when authorization is not enabled.
> > 
> > --- SQL operation failed with errors.
> 87c102,104
> < --- SQL operation complete.
> ---
> > *** ERROR[1222] Command not supported when authorization is not enabled.
> > 
> > --- SQL operation failed with errors.
> 90c107,109
> < --- SQL operation complete.
> ---
> > *** ERROR[1222] Command not supported when authorization is not enabled.
> > 
> > --- SQL operation failed with errors.
> 94c113,115
> < --- SQL operation complete.
> ---
> > *** ERROR[1222] Command not supported when authorization is not enabled.
> > 
> > --- SQL operation failed with errors.
> 97c118,120
> < --- SQL operation complete.
> ---
> > *** ERROR[1222] Command not supported when authorization is not enabled.
> > 
> > --- SQL operation failed with errors.
> 100c123,125
> < --- SQL operation complete.
> ---
> > *** ERROR[1222] Command not supported when authorization is not enabled.
> > 
> > --- SQL operation failed with errors.
> 107c132,134
> < --- SQL operation complete.
> ---
> > *** ERROR[1222] Command not supported when authorization is not enabled.
> > 
> > --- SQL operation failed with errors.
> 110c137,139
> < --- SQL operation complete.
> ---
> > *** ERROR[1222] Command not supported when authorization is not enabled.
> > 
> > --- SQL operation failed with errors.
> 113c142,144
> < --- SQL operation complete.
> ---
> > *** ERROR[1222] Command not supported when authorization is not enabled.
> > 
> > --- SQL operation failed with errors.
> 117c148,150
> < --- SQL operation complete.
> ---
> > *** ERROR[1222] Command not supported when authorization is not enabled.
> > 
> > --- SQL operation failed with errors.
> 120c153,155
> < --- SQL operation complete.
> ---
> > *** ERROR[1222] Command not supported when authorization is not enabled.
> > 
> > --- SQL operation failed with errors.
> 123c158,160
> < --- SQL operation complete.
> ---
> > *** ERROR[1222] Command not supported when authorization is not enabled.
> > 
> > --- SQL operation failed with errors.
> 129,132d165
> < *** ERROR[4481] The user does not have SELECT privilege on table or view 
> #CAT.#SCH.T131C.
> < 
> < *** ERROR[8822] The statement was not prepared.
> < 
> 149c182,184
> < --- SQL operation complete.
> ---
> > *** ERROR[1222] Command not supported when authorization is not enabled.
> > 
> > --- SQL operation failed with errors.
> 152c187,189
> < --- 1 row(s) inserted.
> ---
> > *** ERROR[8102] The operation is prevented by a unique constraint.
> > 
> > --- 0 row(s) inserted.
> 177c214,216
> < --- SQL operation complete.
> ---
> > *** ERROR[1222] Command not supported when authorization is not enabled.
> > 
> > --- SQL operation failed with errors.
> 181c220,222
> < --- SQL operation complete.
> ---
> > *** ERROR[1222] Command not supported when authorization is not enabled.
> > 
> > --- SQL operation failed with errors.
> 200c241
> <  1

[jira] [Closed] (TRAFODION-1876) CORE/TEST116 fails when run in standalone

2016-03-04 Thread Eric Owhadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/TRAFODION-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Owhadi closed TRAFODION-1876.
--
Resolution: Not A Bug

CORE/TEST000 must be run to set some pre-requisit right

> CORE/TEST116 fails when run in standalone
> -
>
> Key: TRAFODION-1876
> URL: https://issues.apache.org/jira/browse/TRAFODION-1876
> Project: Apache Trafodion
>  Issue Type: Bug
>  Components: Build Infrastructure
> Environment: using trafodion master of March 4th 2016 and running on 
> a development build machine.
>Reporter: Eric Owhadi
>
> using this command to run test:
> $scriptsdir/core/runregr -sb TEST116
> getting this diff116:
> 454c454
> < *** ERROR[1431] Object #CAT.#SCH.T116T1 exists in HBase. This could be due 
> to a concurrent transactional ddl operation in progress on this table.
> ---
> > *** ERROR[1390] Object TRAFODION.TRAFODION.T116T1 already exists in 
> > TRAFODION.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TRAFODION-1876) CORE/TEST116 fails when run in standalone

2016-03-04 Thread Eric Owhadi (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180119#comment-15180119
 ] 

Eric Owhadi edited comment on TRAFODION-1876 at 3/4/16 4:45 PM:


OK, nice catch that was it, closing this JIRA...


was (Author: eowhadi):
did not check, is core/test000 supposed to be run when running all test? 
Because I got error running all test too.
Is TEST000 supposed to create SCH and put it in default table?


> CORE/TEST116 fails when run in standalone
> -
>
> Key: TRAFODION-1876
> URL: https://issues.apache.org/jira/browse/TRAFODION-1876
> Project: Apache Trafodion
>  Issue Type: Bug
>  Components: Build Infrastructure
> Environment: using trafodion master of March 4th 2016 and running on 
> a development build machine.
>Reporter: Eric Owhadi
>
> using this command to run test:
> $scriptsdir/core/runregr -sb TEST116
> getting this diff116:
> 454c454
> < *** ERROR[1431] Object #CAT.#SCH.T116T1 exists in HBase. This could be due 
> to a concurrent transactional ddl operation in progress on this table.
> ---
> > *** ERROR[1390] Object TRAFODION.TRAFODION.T116T1 already exists in 
> > TRAFODION.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TRAFODION-1876) CORE/TEST116 fails when run in standalone

2016-03-04 Thread Eric Owhadi (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180119#comment-15180119
 ] 

Eric Owhadi commented on TRAFODION-1876:


did not check, is core/test000 supposed to be run when running all test? 
Because I got error running all test too.
Is TEST000 supposed to create SCH and put it in default table?


> CORE/TEST116 fails when run in standalone
> -
>
> Key: TRAFODION-1876
> URL: https://issues.apache.org/jira/browse/TRAFODION-1876
> Project: Apache Trafodion
>  Issue Type: Bug
>  Components: Build Infrastructure
> Environment: using trafodion master of March 4th 2016 and running on 
> a development build machine.
>Reporter: Eric Owhadi
>
> using this command to run test:
> $scriptsdir/core/runregr -sb TEST116
> getting this diff116:
> 454c454
> < *** ERROR[1431] Object #CAT.#SCH.T116T1 exists in HBase. This could be due 
> to a concurrent transactional ddl operation in progress on this table.
> ---
> > *** ERROR[1390] Object TRAFODION.TRAFODION.T116T1 already exists in 
> > TRAFODION.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TRAFODION-1877) CORE/TEST131 failed when run in standalone

2016-03-04 Thread Eric Owhadi (JIRA)
Eric Owhadi created TRAFODION-1877:
--

 Summary: CORE/TEST131 failed when run in standalone
 Key: TRAFODION-1877
 URL: https://issues.apache.org/jira/browse/TRAFODION-1877
 Project: Apache Trafodion
  Issue Type: Bug
 Environment: trafodion master pulled on March 4th 2016
Reporter: Eric Owhadi


$scriptsdir/core/runregr -sb TEST131

getting this result:
18c18,20
< --- SQL operation complete.
---
> *** ERROR[1390] Object TRAFODION.TRAFODION.T131A already exists in TRAFODION.
> 
> --- SQL operation failed with errors.
30c32,34
< --- SQL operation complete.
---
> *** ERROR[1390] Object TRAFODION.TRAFODION.T131B already exists in TRAFODION.
> 
> --- SQL operation failed with errors.
42c46,48
< --- SQL operation complete.
---
> *** ERROR[1390] Object TRAFODION.TRAFODION.T131C already exists in TRAFODION.
> 
> --- SQL operation failed with errors.
46c52,54
< --- SQL operation complete.
---
> *** ERROR[1222] Command not supported when authorization is not enabled.
> 
> --- SQL operation failed with errors.
49c57,59
< --- SQL operation complete.
---
> *** ERROR[1222] Command not supported when authorization is not enabled.
> 
> --- SQL operation failed with errors.
52c62,64
< --- SQL operation complete.
---
> *** ERROR[1222] Command not supported when authorization is not enabled.
> 
> --- SQL operation failed with errors.
71d82
<  Query_Invalidation_Keys {
79c90,92
< --- SQL operation complete.
---
> *** ERROR[1222] Command not supported when authorization is not enabled.
> 
> --- SQL operation failed with errors.
84c97,99
< --- SQL operation complete.
---
> *** ERROR[1222] Command not supported when authorization is not enabled.
> 
> --- SQL operation failed with errors.
87c102,104
< --- SQL operation complete.
---
> *** ERROR[1222] Command not supported when authorization is not enabled.
> 
> --- SQL operation failed with errors.
90c107,109
< --- SQL operation complete.
---
> *** ERROR[1222] Command not supported when authorization is not enabled.
> 
> --- SQL operation failed with errors.
94c113,115
< --- SQL operation complete.
---
> *** ERROR[1222] Command not supported when authorization is not enabled.
> 
> --- SQL operation failed with errors.
97c118,120
< --- SQL operation complete.
---
> *** ERROR[1222] Command not supported when authorization is not enabled.
> 
> --- SQL operation failed with errors.
100c123,125
< --- SQL operation complete.
---
> *** ERROR[1222] Command not supported when authorization is not enabled.
> 
> --- SQL operation failed with errors.
107c132,134
< --- SQL operation complete.
---
> *** ERROR[1222] Command not supported when authorization is not enabled.
> 
> --- SQL operation failed with errors.
110c137,139
< --- SQL operation complete.
---
> *** ERROR[1222] Command not supported when authorization is not enabled.
> 
> --- SQL operation failed with errors.
113c142,144
< --- SQL operation complete.
---
> *** ERROR[1222] Command not supported when authorization is not enabled.
> 
> --- SQL operation failed with errors.
117c148,150
< --- SQL operation complete.
---
> *** ERROR[1222] Command not supported when authorization is not enabled.
> 
> --- SQL operation failed with errors.
120c153,155
< --- SQL operation complete.
---
> *** ERROR[1222] Command not supported when authorization is not enabled.
> 
> --- SQL operation failed with errors.
123c158,160
< --- SQL operation complete.
---
> *** ERROR[1222] Command not supported when authorization is not enabled.
> 
> --- SQL operation failed with errors.
129,132d165
< *** ERROR[4481] The user does not have SELECT privilege on table or view 
#CAT.#SCH.T131C.
< 
< *** ERROR[8822] The statement was not prepared.
< 
149c182,184
< --- SQL operation complete.
---
> *** ERROR[1222] Command not supported when authorization is not enabled.
> 
> --- SQL operation failed with errors.
152c187,189
< --- 1 row(s) inserted.
---
> *** ERROR[8102] The operation is prevented by a unique constraint.
> 
> --- 0 row(s) inserted.
177c214,216
< --- SQL operation complete.
---
> *** ERROR[1222] Command not supported when authorization is not enabled.
> 
> --- SQL operation failed with errors.
181c220,222
< --- SQL operation complete.
---
> *** ERROR[1222] Command not supported when authorization is not enabled.
> 
> --- SQL operation failed with errors.
200c241
<  1 1 1 1 1 1
---
>  1 45 1 1 1 1
205,208d245
< *** WARNING[8597] Statement was automatically retried 1 time(s). Delay before 
each retry was 0 seconds. See next entry for the error that caused this retry.
< 
< *** WARNING[8734] Statement must be recompiled to allow privileges to be 
re-evaluated.
< 
218c255
<  1 23 1 1 1 1
---
>  1 67 1 1 1 1
236c273,275
< --- SQL operation complete.
---
> *** ERROR[1222] Command not supported when authorization is not enabled.
> 
> --- SQL operation failed with errors.
254c293
<  1 23 1 1 1 1
---
>  1 67 1 1 1 1
259,267c298
< *** ERROR[4481] 

[jira] [Created] (TRAFODION-1876) CORE/TEST116 fails when run in standalone

2016-03-04 Thread Eric Owhadi (JIRA)
Eric Owhadi created TRAFODION-1876:
--

 Summary: CORE/TEST116 fails when run in standalone
 Key: TRAFODION-1876
 URL: https://issues.apache.org/jira/browse/TRAFODION-1876
 Project: Apache Trafodion
  Issue Type: Bug
  Components: Build Infrastructure
 Environment: using trafodion master of March 4th 2016 and running on a 
development build machine.
Reporter: Eric Owhadi


using this command to run test:
$scriptsdir/core/runregr -sb TEST116

getting this diff116:
454c454
< *** ERROR[1431] Object #CAT.#SCH.T116T1 exists in HBase. This could be due to 
a concurrent transactional ddl operation in progress on this table.
---
> *** ERROR[1390] Object TRAFODION.TRAFODION.T116T1 already exists in TRAFODION.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TRAFODION-1811) upsert failure when using nullable key columns

2016-02-03 Thread Eric Owhadi (JIRA)
Eric Owhadi created TRAFODION-1811:
--

 Summary: upsert failure when using nullable key columns
 Key: TRAFODION-1811
 URL: https://issues.apache.org/jira/browse/TRAFODION-1811
 Project: Apache Trafodion
  Issue Type: Bug
  Components: sql-exe
Reporter: Eric Owhadi


upsert fails on predicate evaluation when a nullable column is used as PK. 
Various consecutive probably associated problem follows... see test bellow:
 
>>create table t(a int, b int not null not droppable unique, primary key(a,b));
 
--- SQL operation complete.
>>insert into t values(null,1);
 
--- 1 row(s) inserted.
>>select * from t;
 
AB  
---  ---
 
  ?1
 
--- 1 row(s) selected.
>>upsert into t values(null,2);
 
*** ERROR[4099] A NULL operand is not allowed in predicate 
(TRAFODION.TPCDSGOOD.T.A = NULL).
 
*** ERROR[8822] The statement was not prepared.
 
>>upsert into t values(2,2);
 
--- 1 row(s) inserted.
>>upsert into t values(1,1);
 
--- 1 row(s) inserted.
>>select * from t;
 
AB  
---  ---
 
  ?1
 
--- 1 row(s) selected.
>>insert into t values (2,2);
 
*** ERROR[8102] The operation is prevented by a unique constraint.
 
--- 0 row(s) inserted.
>>select * from t;
 
AB  
---  ---
 
  ?1
 
--- 1 row(s) selected.
>>select * from t where b=2;
 
--- 0 row(s) selected.
>>insert into t values (3,3);
 
--- 1 row(s) inserted.
>>select * from t;
 
AB  
---  ---
 
  ?1
  33
 
--- 2 row(s) selected.
 
So 2 questions after this test:
-  Shouldn’t upsert try to use special null semantic instead of failing 
on predicate evaluation?
-  Look like upsert of 2,2 succeeded, but next select * does not show 
it… however, trying to insert (2,2) fails due to unique key constraint… so look 
like the upsert worked half way…
-  Why would the second upsert says there was an insert, but the next 
select * statement shows that neither insert or update was performed?
 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (TRAFODION-1420) Use ClientSmallScanner for small scans to improve performance

2016-01-27 Thread Eric Owhadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/TRAFODION-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Owhadi closed TRAFODION-1420.
--

> Use ClientSmallScanner for small scans to improve performance
> -
>
> Key: TRAFODION-1420
> URL: https://issues.apache.org/jira/browse/TRAFODION-1420
> Project: Apache Trafodion
>  Issue Type: Improvement
>  Components: sql-cmp, sql-exe
>Reporter: Eric Owhadi
>Assignee: Eric Owhadi
>  Labels: performance
>
> Hbase implements an optimization for small scan (defined as scanning less 
> than a data block ie 64Kb) resulting in 3X performance improvement. The 
> underlying trick is about cutting down RPC calls from 3 (OpenScan/Next/Close) 
>  to 1, and use pread stateless instead of seek/read state-full and locking 
> method to read data.  This JIRA is about improving the compiler who can be 
> aware if a scan will be acting on single data block (small) or not, and pass 
> this data to executor so that it can use the right parameter for scan. 
> (scan.setSmall(boolean)).
> reference:
> https://issues.apache.org/jira/browse/HBASE-9488
> https://issues.apache.org/jira/browse/HBASE-7266



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TRAFODION-1420) Use ClientSmallScanner for small scans to improve performance

2016-01-27 Thread Eric Owhadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/TRAFODION-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Owhadi resolved TRAFODION-1420.

Resolution: Fixed

implemented and merged in PR 284

> Use ClientSmallScanner for small scans to improve performance
> -
>
> Key: TRAFODION-1420
> URL: https://issues.apache.org/jira/browse/TRAFODION-1420
> Project: Apache Trafodion
>  Issue Type: Improvement
>  Components: sql-cmp, sql-exe
>Reporter: Eric Owhadi
>Assignee: Eric Owhadi
>  Labels: performance
> Fix For: 2.0-incubating
>
>
> Hbase implements an optimization for small scan (defined as scanning less 
> than a data block ie 64Kb) resulting in 3X performance improvement. The 
> underlying trick is about cutting down RPC calls from 3 (OpenScan/Next/Close) 
>  to 1, and use pread stateless instead of seek/read state-full and locking 
> method to read data.  This JIRA is about improving the compiler who can be 
> aware if a scan will be acting on single data block (small) or not, and pass 
> this data to executor so that it can use the right parameter for scan. 
> (scan.setSmall(boolean)).
> reference:
> https://issues.apache.org/jira/browse/HBASE-9488
> https://issues.apache.org/jira/browse/HBASE-7266



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TRAFODION-1420) Use ClientSmallScanner for small scans to improve performance

2016-01-27 Thread Eric Owhadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/TRAFODION-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Owhadi updated TRAFODION-1420:
---
Fix Version/s: (was: 2.0-incubating)

> Use ClientSmallScanner for small scans to improve performance
> -
>
> Key: TRAFODION-1420
> URL: https://issues.apache.org/jira/browse/TRAFODION-1420
> Project: Apache Trafodion
>  Issue Type: Improvement
>  Components: sql-cmp, sql-exe
>Reporter: Eric Owhadi
>Assignee: Eric Owhadi
>  Labels: performance
>
> Hbase implements an optimization for small scan (defined as scanning less 
> than a data block ie 64Kb) resulting in 3X performance improvement. The 
> underlying trick is about cutting down RPC calls from 3 (OpenScan/Next/Close) 
>  to 1, and use pread stateless instead of seek/read state-full and locking 
> method to read data.  This JIRA is about improving the compiler who can be 
> aware if a scan will be acting on single data block (small) or not, and pass 
> this data to executor so that it can use the right parameter for scan. 
> (scan.setSmall(boolean)).
> reference:
> https://issues.apache.org/jira/browse/HBASE-9488
> https://issues.apache.org/jira/browse/HBASE-7266



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TRAFODION-1771) TESTRTS fails

2016-01-26 Thread Eric Owhadi (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15118457#comment-15118457
 ] 

Eric Owhadi commented on TRAFODION-1771:


Hi Sandhya,
I have a test pull request that I kept open for you that was showcasing the
failure on Jenkins. If you need it let me know, else I can delete it.
Cheers,
Eric



> TESTRTS fails
> -
>
> Key: TRAFODION-1771
> URL: https://issues.apache.org/jira/browse/TRAFODION-1771
> Project: Apache Trafodion
>  Issue Type: Bug
>  Components: Build Infrastructure
>Affects Versions: 2.0-incubating
>Reporter: Eric Owhadi
>Assignee: Selvaganesan Govindarajan
> Attachments: corefiles.log
>
>
> TESTRTS is failing with core dumped on PR255



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TRAFODION-1783) DIVISION BY feature not documented in SQL reference manual

2016-01-26 Thread Eric Owhadi (JIRA)
Eric Owhadi created TRAFODION-1783:
--

 Summary: DIVISION BY feature not documented in SQL reference manual
 Key: TRAFODION-1783
 URL: https://issues.apache.org/jira/browse/TRAFODION-1783
 Project: Apache Trafodion
  Issue Type: Documentation
  Components: documentation
Affects Versions: any
Reporter: Eric Owhadi






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (TRAFODION-1662) Predicate push down revisited (V2)

2016-01-21 Thread Eric Owhadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/TRAFODION-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Owhadi closed TRAFODION-1662.
--
Resolution: Fixed

PR-255

> Predicate push down revisited (V2)
> --
>
> Key: TRAFODION-1662
> URL: https://issues.apache.org/jira/browse/TRAFODION-1662
> Project: Apache Trafodion
>  Issue Type: Improvement
>  Components: sql-exe
>Affects Versions: 2.0-incubating
>Reporter: Eric Owhadi
>Assignee: Eric Owhadi
>  Labels: predicate, pushdown
> Attachments: Advanced predicate push down feature.docx, Advanced 
> predicate push down feature.docx, Performance results analyzing effects of 
> optimizations introduced in pushdown V2.docx
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Currently Trafodion predicate push down to hbase is supporting only the 
> following cases:
>  AND   AND…
> And require columns to be “SERIALIZED” (can be compared using binary 
> comparator), 
> and value data type is not a superset of column data type.
> and char type is not case insensitive or upshifted
> and no support for Big Numbers
> It suffer from several issues:
> - Handling of nullable column:
> When a nullable column is involved in the predicate, because of the way nulls 
> are handled in trafodion (can ether be missing cell, or cell with first byte 
> set to xFF), binary compare cannot do a good job at semantically treating 
> NULL the way a SQL expression would require. So the current behavior is that 
> all null column values as never filtered out and always returned, letting 
> trafodion perform a second pass predicate evaluation to deal with nulls. This 
> can quickly turn counterproductive for very sparse columns, as we would 
> perform useless filtering at region server side (since all nulls are pass), 
> and optimizer has not been coded to turn off the feature on sparse columns.
> In addition, since null handling is done on trafodion side, the current code 
> artificially pull all key columns to make sure that a null coded as absent 
> cell is correctly pushed up for evaluation at trafodion layer. This can be 
> optimized by only requiring a single non-nullable column on current code, but 
> this is another story… as you will see bellow, the proposed new way of doing 
> pushdown will handle 100% nulls at hbase layer, therefore requiring adding 
> non nullable column only when a nullable column is needed in the select 
> statement (not in the predicate).
> - Always returning predicate columns
> Select a from t where b>10 would always return the b column to trafodion, 
> even if b is non nullable. This is not necessary and will result in useless 
> network and cpu consumption, even if the predicate is not re-evaluated.
> The new advanced predicate push down feature will do the following:
> Support any of these primitives:
> 
> (nice to have, high cost of custom filter low 
> value after TPC-DS query survey) 
> Is null
> Is not null
> Like  -> to be investigated, not yet covered in this document
> And combination of these primitive with arbitrary number of OR and AND with ( 
> ) associations, given that within () there is only ether any number of OR or 
> any number of AND, no mixing OR and AND inside (). I suspect that normalizer 
> will always convert expression so that this mixing never happen…
> And will remove the 2 shortcoming of previous implementation: all null cases 
> will be handled at hbase layer, never requiring re-doing evaluation and the 
> associated pushing up of null columns, and predicate columns will not be 
> pushed up if not needed by the node for other task than the predicate 
> evaluation.
> Note that BETWEEN and IN predicate, when normalized as one of the form 
> supported above, will be pushed down too. Nothing in the code will need to be 
> done to support this. 
> Improvement of explain:
> We currently do not show predicate push down information in the scan node. 2 
> key information is needed:
> - Is predicate push down used
> - What columns are retrieved by the scan node (investigate why we get 
> column all instead of accurate information)
> The first one is obviously used to determine if all the conditions are met to 
> have push down available, and the second is used to make sure we are not 
> pushing up data from columns we don’t need.
> Note that columns info is inconsistently shown today. Need to fix this.
> Enablement using an existing ON/OFF CQD (HBASE_FILTER_PREDS) that will be 
> replaced with a multi value CQD that will enable various level of push down 
> optimization, like we have on PCODE optimization level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TRAFODION-1662) Predicate push down revisited (V2)

2016-01-21 Thread Eric Owhadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/TRAFODION-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Owhadi updated TRAFODION-1662:
---
Attachment: Performance results analyzing effects of optimizations 
introduced in pushdown V2.docx

performance impact of predicate pushdown V2

> Predicate push down revisited (V2)
> --
>
> Key: TRAFODION-1662
> URL: https://issues.apache.org/jira/browse/TRAFODION-1662
> Project: Apache Trafodion
>  Issue Type: Improvement
>  Components: sql-exe
>Affects Versions: 2.0-incubating
>Reporter: Eric Owhadi
>Assignee: Eric Owhadi
>  Labels: predicate, pushdown
> Attachments: Advanced predicate push down feature.docx, Advanced 
> predicate push down feature.docx, Performance results analyzing effects of 
> optimizations introduced in pushdown V2.docx
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Currently Trafodion predicate push down to hbase is supporting only the 
> following cases:
>  AND   AND…
> And require columns to be “SERIALIZED” (can be compared using binary 
> comparator), 
> and value data type is not a superset of column data type.
> and char type is not case insensitive or upshifted
> and no support for Big Numbers
> It suffer from several issues:
> - Handling of nullable column:
> When a nullable column is involved in the predicate, because of the way nulls 
> are handled in trafodion (can ether be missing cell, or cell with first byte 
> set to xFF), binary compare cannot do a good job at semantically treating 
> NULL the way a SQL expression would require. So the current behavior is that 
> all null column values as never filtered out and always returned, letting 
> trafodion perform a second pass predicate evaluation to deal with nulls. This 
> can quickly turn counterproductive for very sparse columns, as we would 
> perform useless filtering at region server side (since all nulls are pass), 
> and optimizer has not been coded to turn off the feature on sparse columns.
> In addition, since null handling is done on trafodion side, the current code 
> artificially pull all key columns to make sure that a null coded as absent 
> cell is correctly pushed up for evaluation at trafodion layer. This can be 
> optimized by only requiring a single non-nullable column on current code, but 
> this is another story… as you will see bellow, the proposed new way of doing 
> pushdown will handle 100% nulls at hbase layer, therefore requiring adding 
> non nullable column only when a nullable column is needed in the select 
> statement (not in the predicate).
> - Always returning predicate columns
> Select a from t where b>10 would always return the b column to trafodion, 
> even if b is non nullable. This is not necessary and will result in useless 
> network and cpu consumption, even if the predicate is not re-evaluated.
> The new advanced predicate push down feature will do the following:
> Support any of these primitives:
> 
> (nice to have, high cost of custom filter low 
> value after TPC-DS query survey) 
> Is null
> Is not null
> Like  -> to be investigated, not yet covered in this document
> And combination of these primitive with arbitrary number of OR and AND with ( 
> ) associations, given that within () there is only ether any number of OR or 
> any number of AND, no mixing OR and AND inside (). I suspect that normalizer 
> will always convert expression so that this mixing never happen…
> And will remove the 2 shortcoming of previous implementation: all null cases 
> will be handled at hbase layer, never requiring re-doing evaluation and the 
> associated pushing up of null columns, and predicate columns will not be 
> pushed up if not needed by the node for other task than the predicate 
> evaluation.
> Note that BETWEEN and IN predicate, when normalized as one of the form 
> supported above, will be pushed down too. Nothing in the code will need to be 
> done to support this. 
> Improvement of explain:
> We currently do not show predicate push down information in the scan node. 2 
> key information is needed:
> - Is predicate push down used
> - What columns are retrieved by the scan node (investigate why we get 
> column all instead of accurate information)
> The first one is obviously used to determine if all the conditions are met to 
> have push down available, and the second is used to make sure we are not 
> pushing up data from columns we don’t need.
> Note that columns info is inconsistently shown today. Need to fix this.
> Enablement using an existing ON/OFF CQD (HBASE_FILTER_PREDS) that will be 
> replaced with a multi value CQD that will enable various level of push down 
> optimization, like we have on PCODE optimization level.



--
This message was sent by Atlassian JIRA

[jira] [Commented] (TRAFODION-1771) TESTRTS fails

2016-01-21 Thread Eric Owhadi (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15111518#comment-15111518
 ] 

Eric Owhadi commented on TRAFODION-1771:


Steve Arnaud can provide access to a VM showing the problem 100%, with just 2 
test to run (TEST005 followed by TESTRTS)

> TESTRTS fails
> -
>
> Key: TRAFODION-1771
> URL: https://issues.apache.org/jira/browse/TRAFODION-1771
> Project: Apache Trafodion
>  Issue Type: Bug
>  Components: Build Infrastructure
>Affects Versions: 2.0-incubating
>Reporter: Eric Owhadi
> Attachments: corefiles.log
>
>
> TESTRTS is failing with core dumped on PR255



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TRAFODION-1771) TESTRTS fails

2016-01-21 Thread Eric Owhadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/TRAFODION-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Owhadi updated TRAFODION-1771:
---
Attachment: corefiles.log

Back trace file at time of failure

> TESTRTS fails
> -
>
> Key: TRAFODION-1771
> URL: https://issues.apache.org/jira/browse/TRAFODION-1771
> Project: Apache Trafodion
>  Issue Type: Bug
>  Components: Build Infrastructure
>Affects Versions: 2.0-incubating
>Reporter: Eric Owhadi
> Attachments: corefiles.log
>
>
> TESTRTS is failing with core dumped on PR255



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TRAFODION-1771) TESTRTS fails

2016-01-21 Thread Eric Owhadi (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15111512#comment-15111512
 ] 

Eric Owhadi commented on TRAFODION-1771:


I have the core dump if someone needs it

> TESTRTS fails
> -
>
> Key: TRAFODION-1771
> URL: https://issues.apache.org/jira/browse/TRAFODION-1771
> Project: Apache Trafodion
>  Issue Type: Bug
>  Components: Build Infrastructure
>Affects Versions: 2.0-incubating
>Reporter: Eric Owhadi
>
> TESTRTS is failing with core dumped on PR255



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TRAFODION-1771) TESTRTS fails

2016-01-21 Thread Eric Owhadi (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15111511#comment-15111511
 ] 

Eric Owhadi commented on TRAFODION-1771:


The last 2 days  I have been hunting with the great help of Steve Arnaud, a 
weird failure blocking the merge of PR 255 (predicate pushdown V2).

[TRAFODION-1662].

13 days ago, PR 255 was passing Jenkins.

6 days ago, after applying changes related to code review on PR 255 and 
synching to latest master, Jenkins started failing the core/TESTRTS with a core 
dump.


So I created a fake PR on a new branch, with the initial code from 13 days ago, 
and merged it with latest master -> Jenkins fails with same error.

Demonstrating that the PR rework was not at root cause of this sudden wrong 
behavior.


So the issue is a combination of my new code (initial or after rework) with 
some changes in master that happened between 13 days ago and 6 days ago.


This failure does not happen on dev environment (tested both in debug and 
release mode).


Steve was able to duplicate it on a jenkins server, and narrowed down the 
condition for its apparition to the sequence of core/TEST005 followed by 
core/TESTRTS

Without TEST005 as catalyst, the issue does not manifest in the Jenkins server 
ether.


The stack trace at time of explosion is not very helpful and shows (in red

8926 is the const value of EXE_STAT_NOT_FOUND):

EXE_STAT_NOT_FOUND can come from 34 different code path, and unfortunately, the 
structure of the code does not help narrowing down witch one of the 34 was 
crossed at time of death with stack trace analysis. -> I hate Murphy…

Thread 1 (Thread 0x7f26080e23c0 (LWP 1505)):

#0  0x7f2605260625 in raise () from /lib64/libc.so.6

#1  0x7f2605261d8d in abort () from /lib64/libc.so.6

#2  0x7f2604d39494 in ComCondition::setSQLCODE (this=, newSQLCODE=-8926) at ../export/ComDiags.cpp:1428

#3  0x7f2603911c56 in ExHandleErrors (qparent=..., down_entry=, matchNo=, globals=, diags_in=, err=4294958370, intParam1=0x0,

stringParam1=0x0, nskErr=0x0, stringParam2=0x0) at

../executor/ex_error.cpp:170

#4  0x7f2603a01e26 in ExExeUtilGetRTSStatisticsTcb::work

(this=0x7f25f39eec58) at ../executor/ExExeUtilGetStats.cpp:4222

#5  0x7f2603a5ee33 in ExScheduler::work (this=0x7f25f39ee7c0, 
prevWaitTime=) at ../executor/ExScheduler.cpp:331

#6  0x7f2603973752 in ex_root_tcb::execute (this=0x7f25f39f4c50, 
cliGlobals=0x2b70120, glob=0x7f25f39b2ca8, input_desc=0x7f25f39aa030, 
diagsArea=@0x7ffd2e100750, reExecute=0) at ../executor/ex_root.cpp:1058

#7  0x7f2604fe7654 in CliStatement::execute (this=0x7f25f39c0ea0, 
cliGlobals=0x2b70120, input_desc=0x7f25f39aa030, diagsArea=, execute_state=, fixupOnly=0, cliflags=0) at

../cli/Statement.cpp:4525

#8  0x7f2604f892ac in SQLCLI_PerformTasks(CliGlobals *, ULng32, SQLSTMT_ID 
*, SQLDESC_ID *, SQLDESC_ID *, Lng32, Lng32, typedef __va_list_tag 
__va_list_tag *, SQLCLI_PTR_PAIRS *, SQLCLI_PTR_PAIRS *) (cliGlobals=0x2b70120, 
tasks=4882, statement_id=0x3398010, input_descriptor=0x34c0eb0, 
output_descriptor=0x0, num_input_ptr_pairs=0, num_output_ptr_pairs=0, 
ap=0x7ffd2e1008f0, input_ptr_pairs=0x0,

output_ptr_pairs=0x0) at ../cli/Cli.cpp:3297

#9  0x7f2604f89fe2 in SQLCLI_Exec(CliGlobals *, SQLSTMT_ID *, SQLDESC_ID *, 
Lng32, typedef __va_list_tag __va_list_tag *, SQLCLI_PTR_PAIRS *) 
(cliGlobals=, statement_id=, 
input_descriptor=, num_ptr_pairs=, 
ap=, ptr_pairs=) at
../cli/Cli.cpp:3544

#10 0x7f2604ff588b in SQL_EXEC_Exec (statement_id=0x3398010, 
input_descriptor=0x34c0eb0, num_ptr_pairs=0) at ../cli/CliExtern.cpp:2074

#11 0x7f26078bb99b in SqlCmd::doExec (sqlci_env=0x2b58c50, stmt=0x3398010, 
prep_stmt=, numUnnamedParams=, 
unnamedParamArray=, unnamedParamCharSetArray=, handleError=1) at

../sqlci/SqlCmd.cpp:1786

#12 0x7f26078bc392 in SqlCmd::do_execute (sqlci_env=0x2b58c50, 
prep_stmt=0x2cefd50, numUnnamedParams=0, unnamedParamArray=0x0, 
unnamedParamCharSetArray=0x0, prepcode=0) at ../sqlci/SqlCmd.cpp:2122

#13 0x7f26078bcabd in DML::process (this=0x2cf0080,

sqlci_env=0x2b58c50) at ../sqlci/SqlCmd.cpp:2897

#14 0x7f26078a2844 in Obey::process (this=0x2ceff60, sqlci_env=) at ../sqlci/Obey.cpp:267

#15 0x7f26078a2844 in Obey::process (this=0x385d960, sqlci_env=) at ../sqlci/Obey.cpp:267

#16 0x7f26078a2844 in Obey::process (this=0x4662980, sqlci_env=) at ../sqlci/Obey.cpp:267

#17 0x7f26078ab074 in SqlciEnv::run (this=0x2b58c50, in_filename=, input_string=) at

../sqlci/SqlciEnv.cpp:729

#18 0x004019d2 in main (argc=2, argv=0x7ffd2e102578) at

../bin/SqlciMain.cpp:329




> TESTRTS fails
> -
>
> Key: TRAFODION-1771
> URL: https://issues.apache.org/jira/browse/TRAFODION-1771
> Project: Apache Trafodion
>  Issue Type: Bug
>  Components: Build Infrastructure
>Affects Ver

[jira] [Created] (TRAFODION-1771) TESTRTS fails

2016-01-21 Thread Eric Owhadi (JIRA)
Eric Owhadi created TRAFODION-1771:
--

 Summary: TESTRTS fails
 Key: TRAFODION-1771
 URL: https://issues.apache.org/jira/browse/TRAFODION-1771
 Project: Apache Trafodion
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 2.0-incubating
Reporter: Eric Owhadi


TESTRTS is failing with core dumped on PR255



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TRAFODION-1662) Predicate push down revisited (V2)

2016-01-08 Thread Eric Owhadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/TRAFODION-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Owhadi updated TRAFODION-1662:
---
Attachment: Advanced predicate push down feature.docx

latest version, after code completed

> Predicate push down revisited (V2)
> --
>
> Key: TRAFODION-1662
> URL: https://issues.apache.org/jira/browse/TRAFODION-1662
> Project: Apache Trafodion
>  Issue Type: Improvement
>  Components: sql-exe
>Affects Versions: 2.0-incubating
>Reporter: Eric Owhadi
>Assignee: Eric Owhadi
>  Labels: predicate, pushdown
> Attachments: Advanced predicate push down feature.docx, Advanced 
> predicate push down feature.docx
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Currently Trafodion predicate push down to hbase is supporting only the 
> following cases:
>  AND   AND…
> And require columns to be “SERIALIZED” (can be compared using binary 
> comparator), 
> and value data type is not a superset of column data type.
> and char type is not case insensitive or upshifted
> and no support for Big Numbers
> It suffer from several issues:
> - Handling of nullable column:
> When a nullable column is involved in the predicate, because of the way nulls 
> are handled in trafodion (can ether be missing cell, or cell with first byte 
> set to xFF), binary compare cannot do a good job at semantically treating 
> NULL the way a SQL expression would require. So the current behavior is that 
> all null column values as never filtered out and always returned, letting 
> trafodion perform a second pass predicate evaluation to deal with nulls. This 
> can quickly turn counterproductive for very sparse columns, as we would 
> perform useless filtering at region server side (since all nulls are pass), 
> and optimizer has not been coded to turn off the feature on sparse columns.
> In addition, since null handling is done on trafodion side, the current code 
> artificially pull all key columns to make sure that a null coded as absent 
> cell is correctly pushed up for evaluation at trafodion layer. This can be 
> optimized by only requiring a single non-nullable column on current code, but 
> this is another story… as you will see bellow, the proposed new way of doing 
> pushdown will handle 100% nulls at hbase layer, therefore requiring adding 
> non nullable column only when a nullable column is needed in the select 
> statement (not in the predicate).
> - Always returning predicate columns
> Select a from t where b>10 would always return the b column to trafodion, 
> even if b is non nullable. This is not necessary and will result in useless 
> network and cpu consumption, even if the predicate is not re-evaluated.
> The new advanced predicate push down feature will do the following:
> Support any of these primitives:
> 
> (nice to have, high cost of custom filter low 
> value after TPC-DS query survey) 
> Is null
> Is not null
> Like  -> to be investigated, not yet covered in this document
> And combination of these primitive with arbitrary number of OR and AND with ( 
> ) associations, given that within () there is only ether any number of OR or 
> any number of AND, no mixing OR and AND inside (). I suspect that normalizer 
> will always convert expression so that this mixing never happen…
> And will remove the 2 shortcoming of previous implementation: all null cases 
> will be handled at hbase layer, never requiring re-doing evaluation and the 
> associated pushing up of null columns, and predicate columns will not be 
> pushed up if not needed by the node for other task than the predicate 
> evaluation.
> Note that BETWEEN and IN predicate, when normalized as one of the form 
> supported above, will be pushed down too. Nothing in the code will need to be 
> done to support this. 
> Improvement of explain:
> We currently do not show predicate push down information in the scan node. 2 
> key information is needed:
> - Is predicate push down used
> - What columns are retrieved by the scan node (investigate why we get 
> column all instead of accurate information)
> The first one is obviously used to determine if all the conditions are met to 
> have push down available, and the second is used to make sure we are not 
> pushing up data from columns we don’t need.
> Note that columns info is inconsistently shown today. Need to fix this.
> Enablement using an existing ON/OFF CQD (HBASE_FILTER_PREDS) that will be 
> replaced with a multi value CQD that will enable various level of push down 
> optimization, like we have on PCODE optimization level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TRAFODION-1671) hive regression TEST009 FAILS

2015-12-03 Thread Eric Owhadi (JIRA)
Eric Owhadi created TRAFODION-1671:
--

 Summary: hive regression TEST009 FAILS
 Key: TRAFODION-1671
 URL: https://issues.apache.org/jira/browse/TRAFODION-1671
 Project: Apache Trafodion
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 1.3-incubating
Reporter: Eric Owhadi
Priority: Minor


regression test hive TEST009 fails (30 lines):
diff files:
28c28,34
< --- 0 row(s) selected.
---
> SCHEMA_NAME
> --
> 
> _HBASESTATS_
> _HB__CELL__
> 
> --- 2 row(s) selected.
446,451c452,457
<  2 HIVE HIVE ITEM
<  3 TRAFODION _HV_HIVE_ PROMOTION
<  4 HIVE HIVE PROMOTION
<  5 TRAFODION _HV_SCH_T009_ T009T2
<  6 HIVE SCH_T009 T009T2
<  7 HIVE HIVE CUSTOMER
---
>  2 TRAFODION _HV_HIVE_ PROMOTION
>  3 HIVE HIVE PROMOTION
>  4 TRAFODION _HV_SCH_T009_ T009T2
>  5 HIVE SCH_T009 T009T2
>  6 HIVE HIVE CUSTOMER
>  7 HIVE HIVE ITEM
507a514
> _HBASESTATS_
511c518
< --- 2 row(s) selected.
---
> --- 3 row(s) selected.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TRAFODION-1662) Predicate push down revisited (V2)

2015-12-01 Thread Eric Owhadi (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034713#comment-15034713
 ] 

Eric Owhadi commented on TRAFODION-1662:


Thanks Suresh,
1/ me too :-)
2/ I am hoping that rms can just consider using the cardinality of the table as 
provided by stats, assuming we have done a full scan for row accessed?
3/ I propose we open a JIRA on dynamic RPC time out for both this one and on 
coprocessor count. But I think in future version of HBase, there is some form 
of heartbeat that should avoid the current RPC timeout issue. To be 
investigated.
4/ good point, thanks for mentioning
5/ indeed, I'll keep this in mind.
6/ I'll have a look, thanks for mentioning


> Predicate push down revisited (V2)
> --
>
> Key: TRAFODION-1662
> URL: https://issues.apache.org/jira/browse/TRAFODION-1662
> Project: Apache Trafodion
>  Issue Type: Improvement
>  Components: sql-exe
>Affects Versions: 2.0-incubating
>Reporter: Eric Owhadi
>Assignee: Eric Owhadi
>  Labels: predicate, pushdown
> Attachments: Advanced predicate push down feature.docx
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Currently Trafodion predicate push down to hbase is supporting only the 
> following cases:
>  AND   AND…
> And require columns to be “SERIALIZED” (can be compared using binary 
> comparator), 
> and value data type is not a superset of column data type.
> and char type is not case insensitive or upshifted
> and no support for Big Numbers
> It suffer from several issues:
> - Handling of nullable column:
> When a nullable column is involved in the predicate, because of the way nulls 
> are handled in trafodion (can ether be missing cell, or cell with first byte 
> set to xFF), binary compare cannot do a good job at semantically treating 
> NULL the way a SQL expression would require. So the current behavior is that 
> all null column values as never filtered out and always returned, letting 
> trafodion perform a second pass predicate evaluation to deal with nulls. This 
> can quickly turn counterproductive for very sparse columns, as we would 
> perform useless filtering at region server side (since all nulls are pass), 
> and optimizer has not been coded to turn off the feature on sparse columns.
> In addition, since null handling is done on trafodion side, the current code 
> artificially pull all key columns to make sure that a null coded as absent 
> cell is correctly pushed up for evaluation at trafodion layer. This can be 
> optimized by only requiring a single non-nullable column on current code, but 
> this is another story… as you will see bellow, the proposed new way of doing 
> pushdown will handle 100% nulls at hbase layer, therefore requiring adding 
> non nullable column only when a nullable column is needed in the select 
> statement (not in the predicate).
> - Always returning predicate columns
> Select a from t where b>10 would always return the b column to trafodion, 
> even if b is non nullable. This is not necessary and will result in useless 
> network and cpu consumption, even if the predicate is not re-evaluated.
> The new advanced predicate push down feature will do the following:
> Support any of these primitives:
> 
> (nice to have, high cost of custom filter low 
> value after TPC-DS query survey) 
> Is null
> Is not null
> Like  -> to be investigated, not yet covered in this document
> And combination of these primitive with arbitrary number of OR and AND with ( 
> ) associations, given that within () there is only ether any number of OR or 
> any number of AND, no mixing OR and AND inside (). I suspect that normalizer 
> will always convert expression so that this mixing never happen…
> And will remove the 2 shortcoming of previous implementation: all null cases 
> will be handled at hbase layer, never requiring re-doing evaluation and the 
> associated pushing up of null columns, and predicate columns will not be 
> pushed up if not needed by the node for other task than the predicate 
> evaluation.
> Note that BETWEEN and IN predicate, when normalized as one of the form 
> supported above, will be pushed down too. Nothing in the code will need to be 
> done to support this. 
> Improvement of explain:
> We currently do not show predicate push down information in the scan node. 2 
> key information is needed:
> - Is predicate push down used
> - What columns are retrieved by the scan node (investigate why we get 
> column all instead of accurate information)
> The first one is obviously used to determine if all the conditions are met to 
> have push down available, and the second is used to make sure we are not 
> pushing up data from columns we don’t need.
> Note that columns inf

[jira] [Updated] (TRAFODION-1662) Predicate push down revisited (V2)

2015-12-01 Thread Eric Owhadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/TRAFODION-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Owhadi updated TRAFODION-1662:
---
Attachment: Advanced predicate push down feature.docx

blueprint

> Predicate push down revisited (V2)
> --
>
> Key: TRAFODION-1662
> URL: https://issues.apache.org/jira/browse/TRAFODION-1662
> Project: Apache Trafodion
>  Issue Type: Improvement
>  Components: sql-exe
>Affects Versions: 2.0-incubating
>Reporter: Eric Owhadi
>Assignee: Eric Owhadi
>  Labels: predicate, pushdown
> Attachments: Advanced predicate push down feature.docx
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Currently Trafodion predicate push down to hbase is supporting only the 
> following cases:
>  AND   AND…
> And require columns to be “SERIALIZED” (can be compared using binary 
> comparator), 
> and value data type is not a superset of column data type.
> and char type is not case insensitive or upshifted
> and no support for Big Numbers
> It suffer from several issues:
> - Handling of nullable column:
> When a nullable column is involved in the predicate, because of the way nulls 
> are handled in trafodion (can ether be missing cell, or cell with first byte 
> set to xFF), binary compare cannot do a good job at semantically treating 
> NULL the way a SQL expression would require. So the current behavior is that 
> all null column values as never filtered out and always returned, letting 
> trafodion perform a second pass predicate evaluation to deal with nulls. This 
> can quickly turn counterproductive for very sparse columns, as we would 
> perform useless filtering at region server side (since all nulls are pass), 
> and optimizer has not been coded to turn off the feature on sparse columns.
> In addition, since null handling is done on trafodion side, the current code 
> artificially pull all key columns to make sure that a null coded as absent 
> cell is correctly pushed up for evaluation at trafodion layer. This can be 
> optimized by only requiring a single non-nullable column on current code, but 
> this is another story… as you will see bellow, the proposed new way of doing 
> pushdown will handle 100% nulls at hbase layer, therefore requiring adding 
> non nullable column only when a nullable column is needed in the select 
> statement (not in the predicate).
> - Always returning predicate columns
> Select a from t where b>10 would always return the b column to trafodion, 
> even if b is non nullable. This is not necessary and will result in useless 
> network and cpu consumption, even if the predicate is not re-evaluated.
> The new advanced predicate push down feature will do the following:
> Support any of these primitives:
> 
> (nice to have, high cost of custom filter low 
> value after TPC-DS query survey) 
> Is null
> Is not null
> Like  -> to be investigated, not yet covered in this document
> And combination of these primitive with arbitrary number of OR and AND with ( 
> ) associations, given that within () there is only ether any number of OR or 
> any number of AND, no mixing OR and AND inside (). I suspect that normalizer 
> will always convert expression so that this mixing never happen…
> And will remove the 2 shortcoming of previous implementation: all null cases 
> will be handled at hbase layer, never requiring re-doing evaluation and the 
> associated pushing up of null columns, and predicate columns will not be 
> pushed up if not needed by the node for other task than the predicate 
> evaluation.
> Note that BETWEEN and IN predicate, when normalized as one of the form 
> supported above, will be pushed down too. Nothing in the code will need to be 
> done to support this. 
> Improvement of explain:
> We currently do not show predicate push down information in the scan node. 2 
> key information is needed:
> - Is predicate push down used
> - What columns are retrieved by the scan node (investigate why we get 
> column all instead of accurate information)
> The first one is obviously used to determine if all the conditions are met to 
> have push down available, and the second is used to make sure we are not 
> pushing up data from columns we don’t need.
> Note that columns info is inconsistently shown today. Need to fix this.
> Enablement using an existing ON/OFF CQD (HBASE_FILTER_PREDS) that will be 
> replaced with a multi value CQD that will enable various level of push down 
> optimization, like we have on PCODE optimization level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (TRAFODION-1662) Predicate push down revisited (V2)

2015-12-01 Thread Eric Owhadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/TRAFODION-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on TRAFODION-1662 started by Eric Owhadi.
--
> Predicate push down revisited (V2)
> --
>
> Key: TRAFODION-1662
> URL: https://issues.apache.org/jira/browse/TRAFODION-1662
> Project: Apache Trafodion
>  Issue Type: Improvement
>  Components: sql-exe
>Affects Versions: 2.0-incubating
>Reporter: Eric Owhadi
>Assignee: Eric Owhadi
>  Labels: predicate, pushdown
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Currently Trafodion predicate push down to hbase is supporting only the 
> following cases:
>  AND   AND…
> And require columns to be “SERIALIZED” (can be compared using binary 
> comparator), 
> and value data type is not a superset of column data type.
> and char type is not case insensitive or upshifted
> and no support for Big Numbers
> It suffer from several issues:
> - Handling of nullable column:
> When a nullable column is involved in the predicate, because of the way nulls 
> are handled in trafodion (can ether be missing cell, or cell with first byte 
> set to xFF), binary compare cannot do a good job at semantically treating 
> NULL the way a SQL expression would require. So the current behavior is that 
> all null column values as never filtered out and always returned, letting 
> trafodion perform a second pass predicate evaluation to deal with nulls. This 
> can quickly turn counterproductive for very sparse columns, as we would 
> perform useless filtering at region server side (since all nulls are pass), 
> and optimizer has not been coded to turn off the feature on sparse columns.
> In addition, since null handling is done on trafodion side, the current code 
> artificially pull all key columns to make sure that a null coded as absent 
> cell is correctly pushed up for evaluation at trafodion layer. This can be 
> optimized by only requiring a single non-nullable column on current code, but 
> this is another story… as you will see bellow, the proposed new way of doing 
> pushdown will handle 100% nulls at hbase layer, therefore requiring adding 
> non nullable column only when a nullable column is needed in the select 
> statement (not in the predicate).
> - Always returning predicate columns
> Select a from t where b>10 would always return the b column to trafodion, 
> even if b is non nullable. This is not necessary and will result in useless 
> network and cpu consumption, even if the predicate is not re-evaluated.
> The new advanced predicate push down feature will do the following:
> Support any of these primitives:
> 
> (nice to have, high cost of custom filter low 
> value after TPC-DS query survey) 
> Is null
> Is not null
> Like  -> to be investigated, not yet covered in this document
> And combination of these primitive with arbitrary number of OR and AND with ( 
> ) associations, given that within () there is only ether any number of OR or 
> any number of AND, no mixing OR and AND inside (). I suspect that normalizer 
> will always convert expression so that this mixing never happen…
> And will remove the 2 shortcoming of previous implementation: all null cases 
> will be handled at hbase layer, never requiring re-doing evaluation and the 
> associated pushing up of null columns, and predicate columns will not be 
> pushed up if not needed by the node for other task than the predicate 
> evaluation.
> Note that BETWEEN and IN predicate, when normalized as one of the form 
> supported above, will be pushed down too. Nothing in the code will need to be 
> done to support this. 
> Improvement of explain:
> We currently do not show predicate push down information in the scan node. 2 
> key information is needed:
> - Is predicate push down used
> - What columns are retrieved by the scan node (investigate why we get 
> column all instead of accurate information)
> The first one is obviously used to determine if all the conditions are met to 
> have push down available, and the second is used to make sure we are not 
> pushing up data from columns we don’t need.
> Note that columns info is inconsistently shown today. Need to fix this.
> Enablement using an existing ON/OFF CQD (HBASE_FILTER_PREDS) that will be 
> replaced with a multi value CQD that will enable various level of push down 
> optimization, like we have on PCODE optimization level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TRAFODION-1662) Predicate push down revisited (V2)

2015-12-01 Thread Eric Owhadi (JIRA)
Eric Owhadi created TRAFODION-1662:
--

 Summary: Predicate push down revisited (V2)
 Key: TRAFODION-1662
 URL: https://issues.apache.org/jira/browse/TRAFODION-1662
 Project: Apache Trafodion
  Issue Type: Improvement
  Components: sql-exe
Affects Versions: 2.0-incubating
Reporter: Eric Owhadi


Currently Trafodion predicate push down to hbase is supporting only the 
following cases:
 AND   AND…
And require columns to be “SERIALIZED” (can be compared using binary 
comparator), 
and value data type is not a superset of column data type.
and char type is not case insensitive or upshifted
and no support for Big Numbers


It suffer from several issues:
-   Handling of nullable column:
When a nullable column is involved in the predicate, because of the way nulls 
are handled in trafodion (can ether be missing cell, or cell with first byte 
set to xFF), binary compare cannot do a good job at semantically treating NULL 
the way a SQL expression would require. So the current behavior is that all 
null column values as never filtered out and always returned, letting trafodion 
perform a second pass predicate evaluation to deal with nulls. This can quickly 
turn counterproductive for very sparse columns, as we would perform useless 
filtering at region server side (since all nulls are pass), and optimizer has 
not been coded to turn off the feature on sparse columns.
In addition, since null handling is done on trafodion side, the current code 
artificially pull all key columns to make sure that a null coded as absent cell 
is correctly pushed up for evaluation at trafodion layer. This can be optimized 
by only requiring a single non-nullable column on current code, but this is 
another story… as you will see bellow, the proposed new way of doing pushdown 
will handle 100% nulls at hbase layer, therefore requiring adding non nullable 
column only when a nullable column is needed in the select statement (not in 
the predicate).
-   Always returning predicate columns
Select a from t where b>10 would always return the b column to trafodion, even 
if b is non nullable. This is not necessary and will result in useless network 
and cpu consumption, even if the predicate is not re-evaluated.

The new advanced predicate push down feature will do the following:
Support any of these primitives:

  (nice to have, high cost of custom filter low value 
after TPC-DS query survey) 
Is null
Is not null
Like-> to be investigated, not yet covered in this document
And combination of these primitive with arbitrary number of OR and AND with ( ) 
associations, given that within () there is only ether any number of OR or any 
number of AND, no mixing OR and AND inside (). I suspect that normalizer will 
always convert expression so that this mixing never happen…
And will remove the 2 shortcoming of previous implementation: all null cases 
will be handled at hbase layer, never requiring re-doing evaluation and the 
associated pushing up of null columns, and predicate columns will not be pushed 
up if not needed by the node for other task than the predicate evaluation.
Note that BETWEEN and IN predicate, when normalized as one of the form 
supported above, will be pushed down too. Nothing in the code will need to be 
done to support this. 

Improvement of explain:
We currently do not show predicate push down information in the scan node. 2 
key information is needed:
-   Is predicate push down used
-   What columns are retrieved by the scan node (investigate why we get 
column all instead of accurate information)
The first one is obviously used to determine if all the conditions are met to 
have push down available, and the second is used to make sure we are not 
pushing up data from columns we don’t need.
Note that columns info is inconsistently shown today. Need to fix this.
Enablement using an existing ON/OFF CQD (HBASE_FILTER_PREDS) that will be 
replaced with a multi value CQD that will enable various level of push down 
optimization, like we have on PCODE optimization level.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TRAFODION-1487) Hedge read to boost read performances ince HBase 1.0

2015-09-07 Thread Eric Owhadi (JIRA)
Eric Owhadi created TRAFODION-1487:
--

 Summary: Hedge read to boost read performances ince HBase 1.0
 Key: TRAFODION-1487
 URL: https://issues.apache.org/jira/browse/TRAFODION-1487
 Project: Apache Trafodion
  Issue Type: Sub-task
Reporter: Eric Owhadi


see here how to configure it: 
http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/admin_hedged_reads.html




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TRAFODION-1486) MultiWal for write heavy workload is promissing. available since HBase 1.0

2015-09-07 Thread Eric Owhadi (JIRA)
Eric Owhadi created TRAFODION-1486:
--

 Summary: MultiWal for write heavy workload is promissing. 
available since HBase 1.0
 Key: TRAFODION-1486
 URL: https://issues.apache.org/jira/browse/TRAFODION-1486
 Project: Apache Trafodion
  Issue Type: Sub-task
Reporter: Eric Owhadi


https://issues.apache.org/jira/browse/HBASE-5699




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TRAFODION-1485) Patch for hbase.client.scanner.timeout.period logic until fix in hbase is available

2015-09-06 Thread Eric Owhadi (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14732367#comment-14732367
 ] 

Eric Owhadi commented on TRAFODION-1485:


Pasting the discussion on the subject on hbase dev list:
You can take a look at HBASE-1: Renew Scanner Lease without advancing the 
RegionScanner, which may be helpful in this kind of case  Your proposal sounds 
like a good alternative approach as well.
We should add that JIRA to the blog link Stack mentioned.

Jerry

On Sat, Sep 5, 2015 at 9:07 AM, Stack  wrote:

> On Fri, Sep 4, 2015 at 5:06 PM, Eric Owhadi  wrote:
>
> > OK so to answer the "is it easy to insert the patched scanner for 
> > trafodion", the answer is no.
> >
>
> I suspected this.
>
>
>
> > Was easier on .98, but on 1.0 it was quite a challenge. All about 
> > dealing with private attributes instead of protected that are not 
> > visible to the PatchClentScanner class that extends ClientScanner.
> > Currently running the regression tests to see if there is no side
> effect...
> > Was able to demonstrate with breakpoint on next() waiting more than 
> > 1 mn (the default lease timeout value) that with the patch things 
> > gracefully reset and all is good, no row skipped or duplicated, 
> > while without, I get the Scanner time out exception. Patch can be 
> > turn on or off with a new
> key
> > in hbase-site.xml...
> > I will feel better when this will be deprecated :-).
> >
>
> Smile.
>
> Excellent. You have a patch for us then Eric?  Sounds like the 
> interjection of your new Scanner would be for pre-2.0. For 2.0 we 
> should just turn on this behavior as the default.
>
> Thanks,
> St.Ack
>
>
>
> > Eric Owhadi
> >
> > -Original Message-
> > From: saint@gmail.com [mailto:saint@gmail.com] On Behalf Of
> Stack
> > Sent: Friday, August 28, 2015 6:35 PM
> > To: HBase Dev List 
> > Subject: Re: Question on hbase.client.scanner.timeout.period
> >
> > On Fri, Aug 28, 2015 at 11:31 AM, Eric Owhadi 
> > 
> > wrote:
> >
> > > That sounds good, but given trafodion needs to work on current and 
> > > future released version of HBase, unpatched, I will first 
> > > implement a ClientScannerTrafodion (to be deprecated), inheriting 
> > > from ClientScanner that will just overload the loadCache(),and 
> > > make sure that the code that is picking up the right scanner based 
> > > on scan object is bypassed to force getting the 
> > > ClientScannerTrafodion when appropriate.
> > > Not very elegant, but need to take into consideration trafodion 
> > > deployment requirements.
> > > Then, if we do not discover any side effect during our QA related 
> > > to this code I will port the fix on HBase to deprecate the custom 
> > > scanner (probably first on HBase 2.0, then will let the community 
> > > decide if this fix is worth it for back porting...). It will be a 
> > > first for me, but that's great, I'll take your offer to help ;-)...
> > >
> >
> > Sweet. Suggest opening an umbrellas issue in hbase to implement this 
> > feature. Reference HBASE-2161 (it is closed now). Link trafodion 
> > issue to it. A subtask could have implementation in hbase 2.0, 
> > another could be backport.
> >
> > Is is easy to insert your T*ClientScanner?
> > St.Ack
> >
> >
> >
> > > Regards,
> > > Eric
> > >
> > > -Original Message-
> > > From: saint@gmail.com [mailto:saint@gmail.com] On Behalf 
> > > Of Stack
> > > Sent: Thursday, August 27, 2015 3:55 PM
> > > To: HBase Dev List 
> > > Subject: Re: Question on hbase.client.scanner.timeout.period
> > >
> > > On Thu, Aug 27, 2015 at 1:39 PM, Eric Owhadi 
> > > 
> > > wrote:
> > >
> > > > Oops, my bad, the related JIRA was :
> > > > https://issues.apache.org/jira/browse/HBASE-2161
> > > >
> > > > I am suggesting that the special code client side in loadCache() 
> > > > of ClientScanner that is trapping the UnknownScannerException, 
> > > > then on purpose check if it is coming from a lease timeout (and 
> > > > not by a region move) to decide that it would throw a 
> > > > ScannerTimeoutException instead of letting the code go and just 
> > > > reset the scanner and start from last successful retrieve (the 
> > > > way it works for an unknowScannerException due to a region moving).
> > > > By just removing the special handling that tries to 
> > > > differentiate from unkownScannerException due to lease timeout, 
> > > > we should have a resolution to JIRA 2161- And to our trafodion issue.
> > > >
> > > > We are still protecting against dead client that would cause 
> > > > resource leak at region server, since we keep the lease timeout 
> > > > mechanism.
> > > >
> > > > Not sure if I have overlooked something, as usually, code is 
> > > > here for a reason :-)...
> > > >
> > > >
> > > Your proposal sounds good to me.
> > >
> > > Scanner works the way it does because it has always work this way
> > (smile).
> > > A while back, one of t

[jira] [Work started] (TRAFODION-1485) Patch for hbase.client.scanner.timeout.period logic until fix in hbase is available

2015-09-06 Thread Eric Owhadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/TRAFODION-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on TRAFODION-1485 started by Eric Owhadi.
--
> Patch for hbase.client.scanner.timeout.period logic until fix in hbase is 
> available
> ---
>
> Key: TRAFODION-1485
> URL: https://issues.apache.org/jira/browse/TRAFODION-1485
> Project: Apache Trafodion
>  Issue Type: Bug
>  Components: sql-exe
>Affects Versions: 1.1 (pre-incubation)
>Reporter: Eric Owhadi
>Assignee: Eric Owhadi
>  Labels: patch
>
>  We have been facing a situation on trafodion, where we are  hitting the 
> hbase.client.scanner.timeout.period scenario:
> basically, when doing queries that require spilling to disk because of high 
> complexity of what is involved, the underlying hbase scanner serving one of 
> the operation involved in the complex query cannot call the next() withing 
> the timeout specify... too busy taking care of other business.
> This is legit scenario, and I was wondering why in the code, special care is 
> done to make sure that client side, if a DNRIOE of type 
> unknownScannerException shows up, and the hbase.client.scanner.timeout.period 
> time elapsed, we make sure to throw a scannerTimeoutException, instead of 
> just let it go and reset scanner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TRAFODION-1485) Patch for hbase.client.scanner.timeout.period logic until fix in hbase is available

2015-09-06 Thread Eric Owhadi (JIRA)
Eric Owhadi created TRAFODION-1485:
--

 Summary: Patch for hbase.client.scanner.timeout.period logic until 
fix in hbase is available
 Key: TRAFODION-1485
 URL: https://issues.apache.org/jira/browse/TRAFODION-1485
 Project: Apache Trafodion
  Issue Type: Bug
  Components: sql-exe
Affects Versions: 1.1 (pre-incubation)
Reporter: Eric Owhadi


 We have been facing a situation on trafodion, where we are  hitting the 
hbase.client.scanner.timeout.period scenario:
basically, when doing queries that require spilling to disk because of high 
complexity of what is involved, the underlying hbase scanner serving one of the 
operation involved in the complex query cannot call the next() withing the 
timeout specify... too busy taking care of other business.
This is legit scenario, and I was wondering why in the code, special care is 
done to make sure that client side, if a DNRIOE of type unknownScannerException 
shows up, and the hbase.client.scanner.timeout.period time elapsed, we make 
sure to throw a scannerTimeoutException, instead of just let it go and reset 
scanner.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TRAFODION-1482) disabling BlockCache for all unbounded scan is not correct for dictionary tables

2015-09-03 Thread Eric Owhadi (JIRA)
Eric Owhadi created TRAFODION-1482:
--

 Summary: disabling BlockCache for all unbounded scan is not 
correct for dictionary tables
 Key: TRAFODION-1482
 URL: https://issues.apache.org/jira/browse/TRAFODION-1482
 Project: Apache Trafodion
  Issue Type: Bug
  Components: sql-cmp, sql-exe
Affects Versions: 1.1 (pre-incubation)
Reporter: Eric Owhadi


There is a workaround that was implemented to avoid cacheBlock trashing 
triggered by full table scan.
It is in HTableClient.java, line looking like:
//Disable block cache for full table scan
If (startRow == null && stopRow == null)
Scan.setCacheBlocks(false);
 
This line bypass the cacheBlocks parameter passed to the startScan, hence is a 
workaround.
 
However, this is a potentially negative workaround from some other performance 
angle on situations like “dictionary tables” on normalized schema.
For example, if you have a table storing status code, error code, country etc , 
and linked to with foreign key, these tables are small and I would imagine they 
will most likely be fetched and spread on esps for hash joins with startRow and 
stopRow null. They won’t be cached with the workaround, but should be. Cache 
trashing is a problem only when scanning large tables.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TRAFODION-1446) End Key missing for simple scan scenario

2015-08-11 Thread Eric Owhadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/TRAFODION-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Owhadi updated TRAFODION-1446:
---
Priority: Blocker  (was: Critical)

> End Key missing for simple scan scenario
> 
>
> Key: TRAFODION-1446
> URL: https://issues.apache.org/jira/browse/TRAFODION-1446
> Project: Apache Trafodion
>  Issue Type: Bug
>  Components: sql-exe
>Affects Versions: 2.0-incubating
> Environment: see on developer build
>Reporter: Eric Owhadi
>Priority: Blocker
>
> Using a table a created with something like:
> Create table t131helper (a int not null, primary key(a));
> Insert into t131helper values(1);
> Create table t1311oneblock
> (uniq int not null,
> C100 int,
> Str1 varchar(4000),
> Primary key (uniq))
> Insert into t131oneblock
>   Select (100*x100)+(10*x10)+x1,
> (100*x100)+(10*x10)+x1,
> ‘xxx’
> From t131helper
> Transpose 0,1,2,3,4,5,6,7,8,9 as x100
> Transpose 0,1,2,3,4,5,6,7,8,9 as x10
> Transpose 0,1,2,3,4,5,6,7,8,9 as x1
> so basically creating a table with key 0,1,2,3,4 etc until 999.
> Then doing a 
> Select * from t131oneblock where uniq >2 and uniq < 5
> you will see that end key is not populated on the scan operator (use explain 
> to notice the empty end key). I stepped in the code, and the error is not 
> just a wrong display on the explain, the end key is not populated down to the 
> java hbase scan invoke.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TRAFODION-1446) End Key missing for simple scan scenario

2015-08-11 Thread Eric Owhadi (JIRA)
Eric Owhadi created TRAFODION-1446:
--

 Summary: End Key missing for simple scan scenario
 Key: TRAFODION-1446
 URL: https://issues.apache.org/jira/browse/TRAFODION-1446
 Project: Apache Trafodion
  Issue Type: Bug
  Components: sql-exe
Affects Versions: 2.0-incubating
 Environment: see on developer build
Reporter: Eric Owhadi
Priority: Critical


Using a table a created with something like:

Create table t131helper (a int not null, primary key(a));
Insert into t131helper values(1);

Create table t1311oneblock
(uniq int not null,
C100 int,
Str1 varchar(4000),
Primary key (uniq))
Insert into t131oneblock
  Select (100*x100)+(10*x10)+x1,
(100*x100)+(10*x10)+x1,
‘xxx’
>From t131helper
Transpose 0,1,2,3,4,5,6,7,8,9 as x100
Transpose 0,1,2,3,4,5,6,7,8,9 as x10
Transpose 0,1,2,3,4,5,6,7,8,9 as x1


so basically creating a table with key 0,1,2,3,4 etc until 999.

Then doing a 

Select * from t131oneblock where uniq >2 and uniq < 5

you will see that end key is not populated on the scan operator (use explain to 
notice the empty end key). I stepped in the code, and the error is not just a 
wrong display on the explain, the end key is not populated down to the java 
hbase scan invoke.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (TRAFODION-1420) Use ClientSmallScanner for small scans to improve performance

2015-07-29 Thread Eric Owhadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/TRAFODION-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on TRAFODION-1420 started by Eric Owhadi.
--
> Use ClientSmallScanner for small scans to improve performance
> -
>
> Key: TRAFODION-1420
> URL: https://issues.apache.org/jira/browse/TRAFODION-1420
> Project: Apache Trafodion
>  Issue Type: Improvement
>  Components: sql-cmp, sql-exe
>Reporter: Eric Owhadi
>Assignee: Eric Owhadi
>  Labels: performance
> Fix For: 2.0-incubating
>
>
> Hbase implements an optimization for small scan (defined as scanning less 
> than a data block ie 64Kb) resulting in 3X performance improvement. The 
> underlying trick is about cutting down RPC calls from 3 (OpenScan/Next/Close) 
>  to 1, and use pread stateless instead of seek/read state-full and locking 
> method to read data.  This JIRA is about improving the compiler who can be 
> aware if a scan will be acting on single data block (small) or not, and pass 
> this data to executor so that it can use the right parameter for scan. 
> (scan.setSmall(boolean)).
> reference:
> https://issues.apache.org/jira/browse/HBASE-9488
> https://issues.apache.org/jira/browse/HBASE-7266



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TRAFODION-1420) Use ClientSmallScanner for small scans to improve performance

2015-07-29 Thread Eric Owhadi (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646414#comment-14646414
 ] 

Eric Owhadi commented on TRAFODION-1420:


after code analysis, turns out that in order for small scanner improvement to 
be significant, we also need to implement 2 different "transactional small 
scanner". 

> Use ClientSmallScanner for small scans to improve performance
> -
>
> Key: TRAFODION-1420
> URL: https://issues.apache.org/jira/browse/TRAFODION-1420
> Project: Apache Trafodion
>  Issue Type: Improvement
>  Components: sql-cmp, sql-exe
>Reporter: Eric Owhadi
>  Labels: performance
> Fix For: 2.0-incubating
>
>
> Hbase implements an optimization for small scan (defined as scanning less 
> than a data block ie 64Kb) resulting in 3X performance improvement. The 
> underlying trick is about cutting down RPC calls from 3 (OpenScan/Next/Close) 
>  to 1, and use pread stateless instead of seek/read state-full and locking 
> method to read data.  This JIRA is about improving the compiler who can be 
> aware if a scan will be acting on single data block (small) or not, and pass 
> this data to executor so that it can use the right parameter for scan. 
> (scan.setSmall(boolean)).
> reference:
> https://issues.apache.org/jira/browse/HBASE-9488
> https://issues.apache.org/jira/browse/HBASE-7266



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TRAFODION-1430) existing feature in hbase .98 allowing to perform parallel seeks in store scanner

2015-07-28 Thread Eric Owhadi (JIRA)
Eric Owhadi created TRAFODION-1430:
--

 Summary: existing feature in hbase .98 allowing to perform 
parallel seeks in store scanner
 Key: TRAFODION-1430
 URL: https://issues.apache.org/jira/browse/TRAFODION-1430
 Project: Apache Trafodion
  Issue Type: Sub-task
Reporter: Eric Owhadi
Priority: Minor


https://issues.apache.org/jira/browse/HBASE-7495
parallel seek in StoreScanner

not sure if we ever tried this knob: 
hbase.storescanner.parallel.seek.enable=true ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TRAFODION-1429) enable pread for all scanner, and dedicated scanner with seek/read for compaction

2015-07-28 Thread Eric Owhadi (JIRA)
Eric Owhadi created TRAFODION-1429:
--

 Summary: enable pread for all scanner, and dedicated scanner with 
seek/read for compaction
 Key: TRAFODION-1429
 URL: https://issues.apache.org/jira/browse/TRAFODION-1429
 Project: Apache Trafodion
  Issue Type: Sub-task
Reporter: Eric Owhadi


Optionally enable p-reads and private readers for compactions:
https://issues.apache.org/jira/browse/HBASE-12411

this should be significant for OLTP workload. wait till trafodion support HBase 
1.0 then play with configuration:
hbase.storescanner.use.pread
hbase.regionserver.compaction.private.readers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TRAFODION-1428) bottleneck discovered on call Configuration for boolean check on Splice Machine testing

2015-07-28 Thread Eric Owhadi (JIRA)
Eric Owhadi created TRAFODION-1428:
--

 Summary: bottleneck discovered on call Configuration for boolean 
check on Splice Machine testing
 Key: TRAFODION-1428
 URL: https://issues.apache.org/jira/browse/TRAFODION-1428
 Project: Apache Trafodion
  Issue Type: Sub-task
Reporter: Eric Owhadi
Priority: Minor


StoreScanner calls Configuration for Boolean Check on each initialization
https://issues.apache.org/jira/browse/HBASE-12912

to monitor, as the patch is not yet implemented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TRAFODION-1427) List of potential new features in HBase to test and profile performance on.

2015-07-28 Thread Eric Owhadi (JIRA)
Eric Owhadi created TRAFODION-1427:
--

 Summary: List of potential new features in HBase to test and 
profile performance on.
 Key: TRAFODION-1427
 URL: https://issues.apache.org/jira/browse/TRAFODION-1427
 Project: Apache Trafodion
  Issue Type: Umbrella
  Components: documentation
Affects Versions: 1.1 (pre-incubation), 2.0-incubating
Reporter: Eric Owhadi


This JIRA is about keeping track of existing or future HBase configuration 
setting that could impact performance and record findings



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TRAFODION-1420) Use ClientSmallScanner for small scans to improve performance

2015-07-27 Thread Eric Owhadi (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14643467#comment-14643467
 ] 

Eric Owhadi commented on TRAFODION-1420:


Qifan, your recall that small scan was enabled by default: REading the code, I 
don't see how, BUT:
I do see that there is a JIRA on HBase (HBAse 12411 Optionally enable p-reads 
and private readers for compaction) that is about changing the behavior of the 
scanner globally to always use pread instead of seek/read (so behave like small 
scanner except for the RPC count optimization), and it comes hand in hand with 
compaction that can optionally use private readers (so you can still use 
seek/read for compaction).
On paper, that looks like the way to go for trafodion high concurrency 
workload, since pread is stateless and no locking happen.
configuration: hbase.storescanner.use.pread on and 
hbase.regionserver.compaction.private.readers on. Is that what you recall as 
"small scanner being already on in Trafodion"? 

> Use ClientSmallScanner for small scans to improve performance
> -
>
> Key: TRAFODION-1420
> URL: https://issues.apache.org/jira/browse/TRAFODION-1420
> Project: Apache Trafodion
>  Issue Type: Improvement
>  Components: sql-cmp, sql-exe
>Reporter: Eric Owhadi
>  Labels: performance
> Fix For: 2.0-incubating
>
>
> Hbase implements an optimization for small scan (defined as scanning less 
> than a data block ie 64Kb) resulting in 3X performance improvement. The 
> underlying trick is about cutting down RPC calls from 3 (OpenScan/Next/Close) 
>  to 1, and use pread stateless instead of seek/read state-full and locking 
> method to read data.  This JIRA is about improving the compiler who can be 
> aware if a scan will be acting on single data block (small) or not, and pass 
> this data to executor so that it can use the right parameter for scan. 
> (scan.setSmall(boolean)).
> reference:
> https://issues.apache.org/jira/browse/HBASE-9488
> https://issues.apache.org/jira/browse/HBASE-7266



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TRAFODION-1421) Implement parallel Scanner primitive

2015-07-27 Thread Eric Owhadi (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14643324#comment-14643324
 ] 

Eric Owhadi commented on TRAFODION-1421:


Anoop, you are talking about "merging sorted streams": In what I was going to 
implement the stream seen by ESP or Master executor would not be multiple 
streams, but a single stream of unsorted data (not random data, but intermingle 
of multiple regions scanned in parallel data in a single stream. So for 
operators that needs sorted stream, that parallel scanner would not be 
appropriate.
Hope this is still useful ?  I guess it is since you would get multi-threading 
parallelism on top of ESP (multi process parallelism)?

> Implement parallel Scanner primitive
> 
>
> Key: TRAFODION-1421
> URL: https://issues.apache.org/jira/browse/TRAFODION-1421
> Project: Apache Trafodion
>  Issue Type: Improvement
>  Components: sql-cmp, sql-exe
>Reporter: Eric Owhadi
>Assignee: Eric Owhadi
>  Labels: performance
> Fix For: 2.0-incubating
>
>
> ClientScanner API is serial, to conserve key ordering. However, many 
> operators don't care about ordering and would rather get the scan result 
> fast, regardless of order. This JIRA is about providing a parallel scanner, 
> that would take care of splitting the work between all region servers evenly 
> if possible. HBase had a parallel scanner in the pipe for quite some time 
> HBAse-9272, but the work is stalled since october 2013. However, looking at 
> the available code, look like a big part can be leveraged without requiring 
> an HBase custom build. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TRAFODION-1419) Add support for multiple column families in a trafodion table

2015-07-27 Thread Eric Owhadi (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14642696#comment-14642696
 ] 

Eric Owhadi commented on TRAFODION-1419:


great, that is what I was thinking would logically happen. The restriction is 
temporary until hybrid and/or multiple align sub rows features shows up. 

> Add support for multiple column families in a trafodion table
> -
>
> Key: TRAFODION-1419
> URL: https://issues.apache.org/jira/browse/TRAFODION-1419
> Project: Apache Trafodion
>  Issue Type: New Feature
>Reporter: Anoop Sharma
>Assignee: Anoop Sharma
>
> This proposal is to add support for multiple column families in trafodion 
> tables. With this feature, one can store columns into multiple column 
> families. One use for this would be to store frequently used columns in one 
> column family and infrequently used columns to be stored in a different 
> column family. That will have performance improvement when those columns are 
> retrieved from hbase. There could be other uses as well.
> Syntax:
> create table  ( .  , 
> .  ….)
>   attributes default column family ;
> alter table  add column . datatype;
>   :  name of column family for that column
> Semantics:
>   name follows identifier rules. If  not double quoted, then it 
> will be upper cased. If double quoted, then case will be maintained.
>  User specified column family can be of arbitrary length. To optimize 
> space for column family stored in a cell, a 2 byte encoding is generated. 
> Mapping of user specified column family to encoded column family is stored in 
> metadata.
>  If no column family is specified for a column during create table, then 
> the family specified in ‘attributes default column family’ clause is used. 
> If no ‘attribute default column family’ clause is specified , then system 
> default col family is used.
>  column family specification is supported for regular and volatile 
> tables. 
>  all unique column families specified during create or alter are added 
> to the table 
>  maximum number of column families supported in one table is 32. But it 
> is hbase recommendation to not create too many column families. 
>  alter statement can be used to assign specific hbase options to 
> specific column families
> using the NAME clause. If no name clause is specified, then alter hbase  
> options are applied
> to all col families.
>  invoke and showddl statements will show the original user specified 
> column families and not the encoded column families
>  Currently, multiple column families are not supported for columns of a 
> user created or an implicitly created index. 
> The default column family of the corresponding base table is used for all 
> index columns.
>  column family cannot be specified in a DML query
>  column family cannot be specified for columns of an aligned row format 
> table since all columns are stored as one cell
>  Column names must be unique for each table. The same column name cannot 
> be used as part of multiple column families.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TRAFODION-1419) Add support for multiple column families in a trafodion table

2015-07-24 Thread Eric Owhadi (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641000#comment-14641000
 ] 

Eric Owhadi commented on TRAFODION-1419:


In the future, are we going to support column family with align row format? The 
restriction is just to not boil ocean in one shot?
Column family support is a great feature and give back some "columnar" 
advantages to trafodion. Cool stuff.

> Add support for multiple column families in a trafodion table
> -
>
> Key: TRAFODION-1419
> URL: https://issues.apache.org/jira/browse/TRAFODION-1419
> Project: Apache Trafodion
>  Issue Type: New Feature
>Reporter: Anoop Sharma
>Assignee: Anoop Sharma
>
> This proposal is to add support for multiple column families in trafodion 
> tables. With this feature, one can store columns into multiple column 
> families. One use for this would be to store frequently used columns in one 
> column family and infrequently used columns to be stored in a different 
> column family. That will have performance improvement when those columns are 
> retrieved from hbase. There could be other uses as well.
> Syntax:
> create table  ( .  , 
> .  ….)
>   attributes default column family ;
> alter table  add column . datatype;
>   :  name of column family for that column
> Semantics:
>   name follows identifier rules. If  not double quoted, then it 
> will be upper cased. If double quoted, then case will be maintained.
>  User specified column family can be of arbitrary length. To optimize 
> space for column family stored in a cell, a 2 byte encoding is generated. 
> Mapping of user specified column family to encoded column family is stored in 
> metadata.
>  If no column family is specified for a column during create table, then 
> the family specified in ‘attributes default column family’ clause is used. 
> If no ‘attribute default column family’ clause is specified , then system 
> default col family is used.
>  column family specification is supported for regular and volatile 
> tables. 
>  all unique column families specified during create or alter are added 
> to the table 
>  maximum number of column families supported in one table is 32. But it 
> is hbase recommendation to not create too many column families. 
>  alter statement can be used to assign specific hbase options to 
> specific column families
> using the NAME clause. If no name clause is specified, then alter hbase  
> options are applied
> to all col families.
>  invoke and showddl statements will show the original user specified 
> column families and not the encoded column families
>  Currently, multiple column families are not supported for columns of a 
> user created or an implicitly created index. 
> The default column family of the corresponding base table is used for all 
> index columns.
>  column family cannot be specified in a DML query
>  column family cannot be specified for columns of an aligned row format 
> table since all columns are stored as one cell
>  Column names must be unique for each table. The same column name cannot 
> be used as part of multiple column families.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TRAFODION-1422) Delete column can be dramatically improved (ALTER statement)

2015-07-24 Thread Eric Owhadi (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14640976#comment-14640976
 ] 

Eric Owhadi commented on TRAFODION-1422:


previous discussions before JIRA was open:
===
On Wed, Jul 22, 2015 at 9:55 PM, Selva Govindarajan < 
selva.govindara...@esgyn.com> wrote:

> I was referring to drop column scenario.  In Trafodion,  we delete all 
> cells given a rowid except for the drop column scenario. Hence 
> deleteRows is not sending the columns parameter.  The deleteRow had 
> column parameter to take care of drop column scenario.  So, if we 
> choose any of three options mentioned, we can remove the column 
> parameter in deleteRow and introduce the needed new methods.
>
> Yes. At least for this case it is possible to create multiple threads 
> to scan and delete in parallel. However HTable/RMInterface object is 
> not thread safe, So we might need to create as many HTable/RMInterface 
> objects as the number of threads and ensure it is transactional too.
>
>
>
> On Wed, Jul 22, 2015 at 4:09 PM, Eric Owhadi 
> wrote:
>
> > Not sure I understand, all this to improve the drop column scenario 
> > that
> we
> > have considered not important?
> > Or are you thinking of another delete scenario?
> >
> > If we want to optimize further by doing parallel plan, I don't think 
> > optimizer is needed. By using the same mechanism that I am planning 
> > for ParallelScan that will multi thread by region, load balancing on 
> > regions servers, just altering the parallelScan to issue a delete 
> > Rows on each thread to unsure that rows to delete on a multiple 
> > delete are from same region.
> >
> > I agree that coproc would be the fastest method, but is it worth 
> > going
> that
> > route given the limited scenario?
> >
> > Eric
> >
> > -Original Message-
> > From: Selva Govindarajan [mailto:selva.govindara...@esgyn.com]
> > Sent: Wednesday, July 22, 2015 5:17 PM
> > To: d...@trafodion.incubator.apache.org
> > Subject: RE: optimization of deleteColumns?
> >
> > Please give consideration to these options  to improve the 
> > performance
> for
> > this scenario
> >
> > 1) Move the implementation to Java to
> > - Reduce JNI to java transitions
> > - Enable multiple deletes
> > 2) Use co-processor to delete
> > 3) Introduce a SQL command like DELETE  FROM 
> > 
> and
> > teach optimizer to do parallel plan and use rowset to delete the 
> > column value.
> >
> > Selva
> >
> >
> >
> > -Original Message-
> > From: Eric Owhadi [mailto:eric.owh...@esgyn.com]
> > Sent: Wednesday, July 22, 2015 3:28 AM
> > To: d...@trafodion.incubator.apache.org
> > Subject: Re: optimization of deleteColumns?
> >
> > Actually, looking at the code further, I believe that there is an 
> > even
> more
> > important possible improvement (I would guess at least 10 time more 
> > important than the KeyOnlyFilter trick):
> > The code is using looping and triggering single delete instead of 
> > doing batch deletes. The reason being that existing deleteRows does 
> > not take columns as parameter. But we could alter it to add it. This 
> > would make cleaner API and allow doing this optimization. I 
> > understand that these ALTER stuff are not frequently used, but I can 
> > imagine that the DBA doing
> schema
> > changes on a database with millions of records might not appreciate 
> > it if it takes too long to drop a column?
> > Should we improve? Should I create a JIRA, even if we decide not to 
> > work
> on
> > it to document the potential improvement?If you think it is worth 
> > it, I
> can
> > assign it to myself as a learning exercise to see if I can go to the 
> > full process?
> > Eric
> >
> >
> > On Tue, Jul 21, 2015 at 11:39 PM, Anoop Sharma 
> > 
> > wrote:
> >
> > > Yes, Selva is right. This code is used to delete the specified 
> > > column from all rows of a table if that column exists.
> > > This is done as part of 'alter table drop column' command.
> > >
> > > The specified column is removed from metadata and then from the table.
> > > For correctness of just the drop command, one can remove that 
> > > column from metadata and not remove it from the actual hbase table.
> > > This would work since referencing that column in a query will 
> > > return an error during compile time and one will never reach the 
> > > point of selecting it from the table.
> > > However, if a column is later added with the same name, then 
> > > incorrect results will be returned due to existing column values 
> > > that were not deleted during the drop command.
> > >
> > > anoop
> > >
> > > -Original Message-
> > > From: Selva Govindarajan [mailto:selva.govindara...@esgyn.com]
> > > Sent: Tuesday, July 21, 2015 8:47 PM
> > > To: d...@trafodion.incubator.apache.org
> > > Subject: RE: optimization of deleteColumns?
> > >
> > > Hi Eric,
> > >
> 

[jira] [Created] (TRAFODION-1422) Delete column can be dramatically improved (ALTER statement)

2015-07-24 Thread Eric Owhadi (JIRA)
Eric Owhadi created TRAFODION-1422:
--

 Summary: Delete column can be dramatically improved (ALTER 
statement)
 Key: TRAFODION-1422
 URL: https://issues.apache.org/jira/browse/TRAFODION-1422
 Project: Apache Trafodion
  Issue Type: Improvement
Reporter: Eric Owhadi
Priority: Minor
 Fix For: 2.0-incubating


The current code path for delete column has not been optimized and can be 
greatly improved. See comments bellow for many ways to implement optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TRAFODION-1421) Implement parallel Scanner primitive

2015-07-24 Thread Eric Owhadi (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14640962#comment-14640962
 ] 

Eric Owhadi commented on TRAFODION-1421:


previous discussion before opening JIRA:

If data is needed in sorted order for an order by clause or for a merge join, 
then optimizer chooses or can potentially choose a plan that will ensure sorted 
order.

This could be done either by reading data in key order if only one partition is 
being read, or reading data from multiple partitions sequentially if data order 
is preserved across multiple partitions, or by doing a merge of multiple 
streams/partitions where each partition is returning data in sorted order, or 
by doing an external sort on returned data from each partitions and then 
merging them, if needed.
Traf opt may or may not be doing all of this at this point.

If an ESP is reading data from multiple partitions/regions, and parallel 
asynchronous functionality is added at ESP level (this will be similar to the 
PAPA (parallel access partition
access) node in the early implementation), then need to make sure that 
optimizer is aware of this runtime functionality and chooses appropriate plan 
by merging sorted streams.

anoop

-Original Message-
From: Eric Owhadi [mailto:eric.owh...@esgyn.com]
Sent: Wednesday, July 22, 2015 11:57 AM
To: d...@trafodion.incubator.apache.org
Subject: Parallel scanner?

Hi All,
I have been looking at how we currently use the scanner. Look like it should be 
not too difficult to inject a parallel scanner instead of the default serial 
scanner since in many use cases we don't care about the ordering of the data 
retrieved.
Key question: do we sometime take advantage of the ordering (to do stuff like 
merges) or are these merges requiring sorting are anyway always at the ESP 
level?
The question is to know if we should have optional serial scanner or parallel 
scanner (one with sorting preserved, the other not) or if we could always 
enable parallel scanner?
On implementation details, we can do sophisticated algorithm to preserve thread 
resources and auto scale the parallelism based on the speed of consumption of 
the code doing next(), or we can simply always go with as many thread as there 
is regions to scan, accepting the fact that some thread will wait() if client 
next() code is not consuming fast enough.
I can prototype the simple one, then move to the auto scaling of thread once 
done.
The reason I need to know if we should keep the serial scanner path is to know 
if I should create a whole new wiring for parallel scanner, or if I can just 
replace the serial scanner with the parallel one (just enabling one or the 
other at config time just for bench-marking purpose).
Anybody working on this already, or should I give it a try?
Regards,
Eric



> Implement parallel Scanner primitive
> 
>
> Key: TRAFODION-1421
> URL: https://issues.apache.org/jira/browse/TRAFODION-1421
> Project: Apache Trafodion
>  Issue Type: Improvement
>  Components: sql-cmp, sql-exe
>Reporter: Eric Owhadi
>Assignee: Eric Owhadi
>  Labels: performance
> Fix For: 2.0-incubating
>
>
> ClientScanner API is serial, to conserve key ordering. However, many 
> operators don't care about ordering and would rather get the scan result 
> fast, regardless of order. This JIRA is about providing a parallel scanner, 
> that would take care of splitting the work between all region servers evenly 
> if possible. HBase had a parallel scanner in the pipe for quite some time 
> HBAse-9272, but the work is stalled since october 2013. However, looking at 
> the available code, look like a big part can be leveraged without requiring 
> an HBase custom build. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (TRAFODION-1421) Implement parallel Scanner primitive

2015-07-24 Thread Eric Owhadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/TRAFODION-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on TRAFODION-1421 started by Eric Owhadi.
--
> Implement parallel Scanner primitive
> 
>
> Key: TRAFODION-1421
> URL: https://issues.apache.org/jira/browse/TRAFODION-1421
> Project: Apache Trafodion
>  Issue Type: Improvement
>  Components: sql-cmp, sql-exe
>Reporter: Eric Owhadi
>Assignee: Eric Owhadi
>  Labels: performance
> Fix For: 2.0-incubating
>
>
> ClientScanner API is serial, to conserve key ordering. However, many 
> operators don't care about ordering and would rather get the scan result 
> fast, regardless of order. This JIRA is about providing a parallel scanner, 
> that would take care of splitting the work between all region servers evenly 
> if possible. HBase had a parallel scanner in the pipe for quite some time 
> HBAse-9272, but the work is stalled since october 2013. However, looking at 
> the available code, look like a big part can be leveraged without requiring 
> an HBase custom build. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TRAFODION-1421) Implement parallel Scanner primitive

2015-07-24 Thread Eric Owhadi (JIRA)
Eric Owhadi created TRAFODION-1421:
--

 Summary: Implement parallel Scanner primitive
 Key: TRAFODION-1421
 URL: https://issues.apache.org/jira/browse/TRAFODION-1421
 Project: Apache Trafodion
  Issue Type: Improvement
  Components: sql-cmp, sql-exe
Reporter: Eric Owhadi
 Fix For: 2.0-incubating


ClientScanner API is serial, to conserve key ordering. However, many operators 
don't care about ordering and would rather get the scan result fast, regardless 
of order. This JIRA is about providing a parallel scanner, that would take care 
of splitting the work between all region servers evenly if possible. HBase had 
a parallel scanner in the pipe for quite some time HBAse-9272, but the work is 
stalled since october 2013. However, looking at the available code, look like a 
big part can be leveraged without requiring an HBase custom build. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TRAFODION-1420) Use ClientSmallScanner for small scans to improve performance

2015-07-24 Thread Eric Owhadi (JIRA)

[ 
https://issues.apache.org/jira/browse/TRAFODION-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14640947#comment-14640947
 ] 

Eric Owhadi commented on TRAFODION-1420:


previous discussion on dev list before JIRA creation for reference:

what is codeGen, and do you know of a global setting to change the default? 
Looking at source code I don't see where the default scan setting for small can 
be anything but false? Also looking at the JNI API we have to trigger scan, 
there is no param that I can see to play with the small param, so I don't see 
how with this API stuff could be turned on or off?
Let me see about JIRA creation, that would be my first, so need to learn :-)
Eric


On Fri, Jul 24, 2015 at 1:31 PM, Qifan Chen  wrote:
During scan improvement work, I recall small scan is turned on by default by 
hbase and should be turned off for large scans.

I wonder if we can confirm that first and how does the flag is set during 
codeGen phase. This will make JIRA more accurate.

Secondly, welcome to the sql component and I am more than happy to provide any 
help in compiler area.

Thanks

-Qifan

Sent from my iPhone

> On Jul 24, 2015, at 12:36 PM, Eric Owhadi  wrote:
>
> oh, I see,
> Eric
>
> On Fri, Jul 24, 2015 at 12:34 PM, Carol Pearson 
> wrote:
>
>> I saw that. I was more responding to your comment about metadata only being
>> read once, so not as necessary to optimize.  That's a slightly different
>> tangent, but one that gets overlooked at times.  Startup time to first
>> select is a key metric for a high-performance database, both for initial
>> install/setup/upgrade and on a simple restart.
>>
>> -Carol P.
>>
>> On Fri, Jul 24, 2015 at 10:30 AM, Eric Owhadi 
>> wrote:
>>
>>> Anoop is suggesting to use this not only for Meta data, but for any query
>>> where compiler evaluate that it would be appropriate to turn the feature
>>> on.
>>> Eric
>>>
>>> On Fri, Jul 24, 2015 at 12:26 PM, Carol Pearson <
>>> carol.pearson...@gmail.com>
>>> wrote:
>>>
 If it only happens once, does that mean that this optimization might
>> be a
 good one at startup time?  If one of the failure modes is to bounce
 something, or our users are simply restarting after some sort of
 maintenance, startup would hit a lot of metadata all at once


 Thanks,
 -Carol P.

 On Fri, Jul 24, 2015 at 9:57 AM, Eric Owhadi 
 wrote:

> more reading:
> https://issues.apache.org/jira/browse/HBASE-9488
>
> turns out that the performance improvement comes from 2 reasons: the
>> 3
 RPC
> collapsed to 1, and the use of pread instead of seek+read.
> see facebook branch :
>> https://issues.apache.org/jira/browse/HBASE-7266
> they have enabled pread all the way, even for long scan using other
> features to implement prefetch needed for long scan.
>
> I incorrectly stated that the criteria was if your result set fit in
>>> your
> cache size. reading deeper, look like the criteria should be:
> if the scan range is within one datablock (64K) then we should set
>>> small
> scan.
>
> looking at the code, I seams  that if you incorrectly set small on
>> non
> small, you just have incorrectly optimized. Will work slower...
> There were some bugs in early implementation not supporting well scan
> ranges crossing regions, but i see that the patch correcting them are
>>> in
> the branch of HBase we use.
>
> and yes the optimization works on the 2 cases you mention.
>
> on what I am observing, I only saw the traffic on first invoke, I
>> have
 not
> tried to observe what happen on several run. So nothing to worry
>> about.
>
> Given the performance boost on small scan (3X), I think what you
>>> propose
> "We
> do have estimates of accessed rows at compile time and could turn
>> this
 opt
> on, if rows are small." should be good candidate to add in the list
>> of
> stuff to do to improve perf...
>
> Eric
>
>
>
> On Fri, Jul 24, 2015 at 9:29 AM, Anoop Sharma <
>> anoop.sha...@esgyn.com>
> wrote:
>
>> There are 2 kind of scans that are done. One is a unique scan where
>>> we
> know
>> that only one unique row or a set of unique rows will be returned.
>> And second is a non-unique scan where multiple rows are returned.
>>
>> Does this optimization apply to both of these cases?
>>
>> We do have estimates of accessed rows at compile time and could
>> turn
 this
>> opt on, if rows are small.
>> What happens if this flag is set and the scan is not small or
>> doesn't
 fit
>> in
>> the cache? Will that work with some perf degradation or will it
>> fail?
>>
>> We only read metadata information from hbase when the table is used
>>> for
> the
>> fir

[jira] [Created] (TRAFODION-1420) Use ClientSmallScanner for small scans to improve performance

2015-07-24 Thread Eric Owhadi (JIRA)
Eric Owhadi created TRAFODION-1420:
--

 Summary: Use ClientSmallScanner for small scans to improve 
performance
 Key: TRAFODION-1420
 URL: https://issues.apache.org/jira/browse/TRAFODION-1420
 Project: Apache Trafodion
  Issue Type: Improvement
  Components: sql-cmp, sql-exe
Reporter: Eric Owhadi
 Fix For: 2.0-incubating


Hbase implements an optimization for small scan (defined as scanning less than 
a data block ie 64Kb) resulting in 3X performance improvement. The underlying 
trick is about cutting down RPC calls from 3 (OpenScan/Next/Close)  to 1, and 
use pread stateless instead of seek/read state-full and locking method to read 
data.  This JIRA is about improving the compiler who can be aware if a scan 
will be acting on single data block (small) or not, and pass this data to 
executor so that it can use the right parameter for scan. 
(scan.setSmall(boolean)).
reference:
https://issues.apache.org/jira/browse/HBASE-9488
https://issues.apache.org/jira/browse/HBASE-7266




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)