[jira] [Assigned] (TRAFODION-2595) Query result with cqd HBASE_DOP_PARALLEL_SCANNER ‘1.0’ not correct
[ https://issues.apache.org/jira/browse/TRAFODION-2595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Owhadi reassigned TRAFODION-2595: -- Assignee: Eric Owhadi > Query result with cqd HBASE_DOP_PARALLEL_SCANNER ‘1.0’ not correct > -- > > Key: TRAFODION-2595 > URL: https://issues.apache.org/jira/browse/TRAFODION-2595 > Project: Apache Trafodion > Issue Type: Bug > Components: -exe >Affects Versions: 2.1-incubating >Reporter: Yuan Liu >Assignee: Eric Owhadi > Fix For: any > > > With cqd HBASE_DOP_PARALLEL_SCANNER ‘1.0’, the query executed very fast, but > the result is not correct. > Below is a query result of a test query, > --Without cqd HBASE_DOP_PARALLEL_SCANNER ‘1.0’ > --- 12315 row(s) selected. > Start Time 2017/04/18 13:15:52.946896 > End Time 2017/04/18 13:51:46.141087 > Elapsed Time 00:35:53.194191 > Compile Time 00:00:00.070804 > Execution Time00:35:53.123157 > --With cqd HBASE_DOP_PARALLEL_SCANNER ‘1.0’ > --- 3139 row(s) selected. > Start Time 2017/04/18 11:03:19.265742 > End Time 2017/04/18 11:04:34.434705 > Elapsed Time 00:01:15.168963 > Compile Time 00:00:01.654184 > Execution Time00:01:13.514594 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (TRAFODION-2591) Creating INDEX on ADDED column gives corrupted results
Eric Owhadi created TRAFODION-2591: -- Summary: Creating INDEX on ADDED column gives corrupted results Key: TRAFODION-2591 URL: https://issues.apache.org/jira/browse/TRAFODION-2591 Project: Apache Trafodion Issue Type: Bug Components: sql-exe Affects Versions: 2.2-incubating Reporter: Eric Owhadi CREATE TABLE "car" ( ":car_id" CHAR(27) CHARACTER SET ISO88591 NOT NULL PRIMARY KEY, "a" CHAR(27) CHARACTER SET ISO88591 NOT NULL, "b" DATE, "c" VARCHAR(30), "d" DATE, "e" VARCHAR (30), "f" CHAR(1) NOT NULL, ":brand" CHAR(1) CHARACTER SET ISO88591, ":horsePower" INT, ":featuresFlag" INT ); UPSERT INTO TABLE "car" VALUES ('sdcABddOdsSk5DzWZxQFO2BkDLc','bbHABhqOdsSk5DwdEsQFO2BkDss',CURRENT_DATE,'me',CURRENT_DATE,'me', 'A','a',122,7), ('wwsvchqOhmksh564ZxsdOSasdcx','bbHABhqOdsSk5DwdEsQFO2BkDss',CURRENT_DATE,'me',CURRENT_DATE,'me', 'A','b',121,3), ('fcxdBhDlsdFc5DzWZaSFO2Bjh67','bbHABhqOdsSk5DwdEsQFO2BkDss',CURRENT_DATE,'me',CURRENT_DATE,'me', 'A','c',133,2), ('ldsxRdqkKkSk5DzWZxQFOsd7shF','bbHABhqOdsSk5DwdEsQFO2BkDss',CURRENT_DATE,'me',CURRENT_DATE,'me', 'A','d',201,1); select ":car_id" from "car"; --data good ALTER TABLE "car" ADD COLUMN "fk_owns_person" CHAR(27) CHARACTER SET ISO88591; select ":car_id" from "car"; -- data good CREATE INDEX "car_idx_fk_owns_person" ON "car" ("fk_owns_person"); select ":car_id" from "car"; -- corrupted data -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (TRAFODION-2512) index access with MDAM not chosen where predicate is range spec
Eric Owhadi created TRAFODION-2512: -- Summary: index access with MDAM not chosen where predicate is range spec Key: TRAFODION-2512 URL: https://issues.apache.org/jira/browse/TRAFODION-2512 Project: Apache Trafodion Issue Type: Bug Components: sql-cmp Affects Versions: 2.2-incubating Reporter: Eric Owhadi create table tbl ( k1 int not null, k2 int not null, ts timestamp not null, a char(10), b varchar(30), c largeint, primary key (k1,k2,ts)) salt using 8 partitions division by (date_trunc('MONTH', ts)) ; upsert using load into tbl select num/1000, num, DATEADD(SECOND,-num,CURRENT_TIMESTAMP),cast(num as char(10)), cast(num as varchar(30)), num*1000 from (select 1000*x1000+100*x100+10*x10+1*x1+1000*x1000+100*x100+10*x10+x1 as num from (values (0)) seed(c) transpose 0,1,2,3,4,5,6,7,8,9 as x1 transpose 0,1,2,3,4,5,6,7,8,9 as x10 transpose 0,1,2,3,4,5,6,7,8,9 as x100 transpose 0,1,2,3,4,5,6,7,8,9 as x1000 transpose 0,1,2,3,4,5,6,7,8,9 as x1 transpose 0,1,2,3,4,5,6,7,8,9 as x10 transpose 0,1,2,3,4,5,6,7,8,9 as x100 transpose 0,1,2,3,4,5,6,7,8,9 as x1000 ) T ; create index tbl_idx_b on tbl(b) salt like table; update statistics for table tbl on every column sample; prepare s from select k1 where b = '1234567'; prepare ss from select k1 from b like '1234567%'; see how s is correctly picking index access. see how ss, regardless of th elike correctly been transform into a range spec, end up doing a full main table scan instead of going after the index on b using MDAM and the range spec inside the mdam disjunct. SQL>prepare s from select k1 from tbl where b = '1234567'; --- SQL command prepared. SQL>explain options 'f' s; LC RC OP OPERATOR OPT DESCRIPTION CARD - 1.2root 1.00E+000 ..1trafodion_index_scanIDX_TBL_B 1.00E+000 --- SQL operation complete. SQL>explain s; -- PLAN SUMMARY MODULE_NAME .. DYNAMICALLY COMPILED STATEMENT_NAME ... S PLAN_ID .. 212355075543213868 ROWS_OUT . 1 EST_TOTAL_COST ... 0.15 STATEMENT select k1 from tbl where b = '1234567' -- NODE LISTING ROOT == SEQ_NO 2ONLY CHILD 1 REQUESTS_IN .. 1 ROWS_OUT . 1 EST_OPER_COST 0 EST_TOTAL_COST ... 0.15 DESCRIPTION max_card_est ... 1 fragment_id 0 parent_frag (none) fragment_type .. master statement_index 0 affinity_value . 0 max_max_cardinality 1 total_overflow_size 0.00 KB xn_access_mode . read_only xn_autoabort_interval0 auto_query_retry ... enabled plan_version ... 2,600 embedded_arkcmp used ObjectUIDs . 636255280475776270 select_list TRAFODION.ERIC.IDX_TBL_B.K1 input_variables %('1234567') TRAFODION_INDEX_SCAN == SEQ_NO 1NO CHILDREN TABLE_NAME ... TBL REQUESTS_IN .. 1 ROWS_OUT . 1 EST_OPER_COST 0.15 EST_TOTAL_COST ... 0.15 DESCRIPTION max_card_est ... 1 fragment_id 0 parent_frag (none) fragment_type .. master scan_type .. subset scan limited by mdam of index TRAFODION.ERIC.IDX_TBL_B(TRAFODION.ERIC.TBL) object_type Trafodion cache_size ... 100 probes . 1 rows_accessed .. 1 column_retrieved ... #1:1 key_columns TRAFODION.ERIC.IDX_TBL_B._SALT_, TRAFODION.ERIC.IDX_TBL_B.B, TRAFODION.ERIC.IDX_TBL_B._DIVISION_1_, TRAFODION.ERIC.IDX_TBL_B.K1, TRAFODION.ERIC.IDX_TBL_B.K2, TRAFODION.ERIC.IDX_TBL_B.TS mdam_disjunct .. (TRAFODION.ERIC.IDX_TBL_B.B = %('1234567')) --- SQL operation complete. SQL>prepare ss from select k1 from tbl where b like '1234567%'; --- SQL command prepared. SQL>explain options 'f' ss; LC RC OP OPERATOR OPT DESCRIPTION CARD - 1.2root 6.25E+006 ..1trafodion_index_scanIDX_TBL_B 6.2
[jira] [Created] (TRAFODION-2488) merge statement with where clause refuse to work in Batch mode
Eric Owhadi created TRAFODION-2488: -- Summary: merge statement with where clause refuse to work in Batch mode Key: TRAFODION-2488 URL: https://issues.apache.org/jira/browse/TRAFODION-2488 Project: Apache Trafodion Issue Type: Bug Components: client-jdbc-t4, connectivity-general Reporter: Eric Owhadi merge into t on k1 = ? and k2 = ? when matched then update set (a,b,c,d) = (?,?,?,?) where c < ? when not matched then insert (k1,k2,a,b,c,d) values (?,?,?,?,?,?); would refuse to run in batched mode using jdbc type 4 driver. error is: *** ERROR[30019] Statement was compiled with scalar parameters and array values used during execution. workaround: merge into t on k1 = ? and k2 = ? when matched then update set (a,b,c,d) = (?,?,?,?) where c < TIMESTAMP '2014-01-27 17:11:10' when not matched then insert (k1,k2,a,b,c,d) values (?,?,?,?,?,?); removing the parametric where clause resolve the problem if we can do so. DLL: CREATE TABLE t ( k1 CHAR(31) CHARACTER SET ISO88591 COLLATE DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE NOT SERIALIZED , k2 VARCHAR(256 CHARS) CHARACTER SET UTF8 COLLATE DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE NOT SERIALIZED , aLARGEINT DEFAULT NULL NOT SERIALIZED , bVARCHAR(50) CHARACTER SET ISO88591 COLLATE DEFAULT DEFAULT NULL NOT SERIALIZED , c TIMESTAMP(6) NO DEFAULT NOT NULL NOT DROPPABLE NOT SERIALIZED , d CHAR(1) CHARACTER SET ISO88591 COLLATE DEFAULT NO DEFAULT NOT NULL NOT DROPPABLE NOT SERIALIZED , PRIMARY KEY (k1 ASC, k2 ASC) ) SALT USING 8 PARTITIONS ON (k1) ATTRIBUTES ALIGNED FORMAT HBASE_OPTIONS ( DATA_BLOCK_ENCODING = 'FAST_DIFF', COMPRESSION = 'SNAPPY' ) ; -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (TRAFODION-2464) failure to upsert into a table with an index
Eric Owhadi created TRAFODION-2464: -- Summary: failure to upsert into a table with an index Key: TRAFODION-2464 URL: https://issues.apache.org/jira/browse/TRAFODION-2464 Project: Apache Trafodion Issue Type: Bug Components: sql-cmp Affects Versions: 2.2-incubating Reporter: Eric Owhadi Attachments: osim.tar create table files ( directory_id char(36) NOT NULL, name varchar(256) NOT NULL, fsize largeint, owner varchar(50), primary key (directory_id,name)) SALT USING 8 partitions on (directory_id) HBASE_OPTIONS (DATA_BLOCK_ENCODING = 'FAST_DIFF', COMPRESSION = 'SNAPPY'); create index files_idx_by_directory_id on files(name,directory_id) SALT LIKE TABLE HBASE_OPTIONS (DATA_BLOCK_ENCODING = 'FAST_DIFF', COMPRESSION = 'SNAPPY'); prepare s from upsert into files values (?,?,?,?); *** ERROR[3241] This MERGE statement is not supported. Reason: Non-unique ON clause not allowed with INSERT. [2017-01-20 00:38:54] *** ERROR[8822] The statement was not prepared. [2017-01-20 00:38:54] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TRAFODION-2422) populateSortCols was flagged as major perf offender during profiling
Eric Owhadi created TRAFODION-2422: -- Summary: populateSortCols was flagged as major perf offender during profiling Key: TRAFODION-2422 URL: https://issues.apache.org/jira/browse/TRAFODION-2422 Project: Apache Trafodion Issue Type: Improvement Components: sql-cmp Reporter: Eric Owhadi Priority: Minor Fix For: 2.2-incubating the reason being an unbound iterative string search on huge string. Added a minor fix: When not setting category in log conf file, extra code is executed, potentially impacting performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TRAFODION-2416) HBASE OPTIONS on CREATE INDEX have no effect
Eric Owhadi created TRAFODION-2416: -- Summary: HBASE OPTIONS on CREATE INDEX have no effect Key: TRAFODION-2416 URL: https://issues.apache.org/jira/browse/TRAFODION-2416 Project: Apache Trafodion Issue Type: Bug Components: sql-exe Affects Versions: 2.1-incubating, 2.2-incubating Reporter: Eric Owhadi create a table t, then create an index with syntax like: CREATE INDEX t_idx_by_a ON t (a) SALT LIKE TABLE HBASE_OPTIONS (DATA_BLOCK_ENCODING = 'FAST_DIFF', COMPRESSION = 'SNAPPY'); then verify using hbase shell and notice that DATA_BLOCK_ENCODING and COMPRESSION are set to NONE : describe "TRAFODION.MYSCHEMA.T_IDX_BY_A" Table TRAFODION.MYSCHEMA.T_IDX_BY_A is ENABLED TRAFODION.MYSCHEMA.T_IDX_BY_A COLUMN FAMILIES DESCRIPTION {NAME => '#1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP _DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '6553 6', REPLICATION_SCOPE => '0'} {NAME => 'mt_', BLOOMFILTER => 'ROW', VERSIONS => '2', IN_MEMORY => 'true', KEEP _DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '6553 6', REPLICATION_SCOPE => '0'} 2 row(s) in 0.0220 seconds workaround is to alter the table using hbase shell... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TRAFODION-2415) wrong plan picked when using predicate on multiple columns of a multi columns INDEX
Eric Owhadi created TRAFODION-2415: -- Summary: wrong plan picked when using predicate on multiple columns of a multi columns INDEX Key: TRAFODION-2415 URL: https://issues.apache.org/jira/browse/TRAFODION-2415 Project: Apache Trafodion Issue Type: Bug Components: sql-cmp Reporter: Eric Owhadi create table t( a char(1) not null, b char(1) not null, c char(1) not null, d char(1) not null, e CHAR(1) NOT NULL, f SMALLINT UNSIGNED NOT NULL, g SMALLINT UNSIGNED NOT NULL, h INT UNSIGNED NOT NULL, customer CHAR(20) NOT NULL, count INT UNSIGNED, price LARGEINT, PRIMARY KEY (a,b,c,d,e,f,g,h,customer) ) SALT USING 4 PARTITIONS; CREATE INDEX t_idx_by_b ON t (b,count,price); CREATE INDEX t_idx_by_c ON t (c,count,price); CREATE INDEX t_idx_by_d ON t (d,count,price); CREATE INDEX t_idx_by_e ON t (e,count,price); CREATE INDEX t_idx_by_f ON t (f,count,price); CREATE INDEX t_idx_by_g ON t (g,count,price); CREATE INDEX t_idx_by_h ON t (h,count,price); CREATE INDEX t_idx_by_count ON t (customer,count,price); SELECT e, SUM(price) FROM t WHERE b IN ('1','2','3') AND f IN (10,20, 30) GROUP BY 1; generate wrong plan doing full scan on t_idx_by_f while SELECT e, SUM(price) FROM t WHERE f IN (10,20, 30) GROUP BY 1; generate good plan doing mdam on t_idx_by_f only. using cqd rangespec_transformation 'off'; makes the problem go away. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TRAFODION-2413) row tatal length is wrongly calculated in Aligned format
Eric Owhadi created TRAFODION-2413: -- Summary: row tatal length is wrongly calculated in Aligned format Key: TRAFODION-2413 URL: https://issues.apache.org/jira/browse/TRAFODION-2413 Project: Apache Trafodion Issue Type: Bug Components: sql-cmp Affects Versions: 2.1-incubating, 2.2-incubating Reporter: Eric Owhadi Row_Total_Length as reported on db manager, or can be retrieved using following query is wrong for aligned format table. The logic used assumes wrongly that the table uses n aligned format. This has consequences beyond just information, but also for performance, internal buffers are wrongly over inflated, and BMO can spill to disk earlier than necessary: set schema "_MD_"; -- Get the Object UID select * from objects where schema_name = 'MYSCHEMA' and object_name = 'MYTABLE'; select * from tables where table_uid = 3146444098927464648; -- shows 630 as total row length -- display columns select column_name, column_number, sql_data_type, column_size from columns where object_uid = 3146444098927464648 order by 2; -- shows same info as DBManager TABLE_UID ROW_FORMAT IS_AUDITED ROW_DATA_LENGTH ROW_TOTAL_LENGTH KEY_LENGTH NUM_SALT_PARTNS FLAGS -- -- --- --- --- 8556393311998525665 AF Y51 630 374 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TRAFODION-2401) Creating Foreign Key constraint NOT ENFORCE still create system index
Eric Owhadi created TRAFODION-2401: -- Summary: Creating Foreign Key constraint NOT ENFORCE still create system index Key: TRAFODION-2401 URL: https://issues.apache.org/jira/browse/TRAFODION-2401 Project: Apache Trafodion Issue Type: Improvement Components: sql-cmp Affects Versions: 2.2-incubating Reporter: Eric Owhadi if you create a table and declare a foreign key not enforced, to help optimizer make good decision in plan selection, but still you elected to "not enforce" the FK checking to improve ingestion performance, trafodion is still creating a system index, and therefore impact ingest performance. The index is never used, as the FK is not enforced, but we are still paying the cost of index maintenance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (TRAFODION-1421) Implement parallel Scanner primitive
[ https://issues.apache.org/jira/browse/TRAFODION-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Owhadi closed TRAFODION-1421. -- > Implement parallel Scanner primitive > > > Key: TRAFODION-1421 > URL: https://issues.apache.org/jira/browse/TRAFODION-1421 > Project: Apache Trafodion > Issue Type: Improvement > Components: sql-cmp, sql-exe >Reporter: Eric Owhadi >Assignee: Eric Owhadi > Labels: performance > Fix For: 2.1-incubating > > > ClientScanner API is serial, to conserve key ordering. However, many > operators don't care about ordering and would rather get the scan result > fast, regardless of order. This JIRA is about providing a parallel scanner, > that would take care of splitting the work between all region servers evenly > if possible. HBase had a parallel scanner in the pipe for quite some time > HBAse-9272, but the work is stalled since october 2013. However, looking at > the available code, look like a big part can be leveraged without requiring > an HBase custom build. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (TRAFODION-1421) Implement parallel Scanner primitive
[ https://issues.apache.org/jira/browse/TRAFODION-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Owhadi resolved TRAFODION-1421. Resolution: Fixed > Implement parallel Scanner primitive > > > Key: TRAFODION-1421 > URL: https://issues.apache.org/jira/browse/TRAFODION-1421 > Project: Apache Trafodion > Issue Type: Improvement > Components: sql-cmp, sql-exe >Reporter: Eric Owhadi >Assignee: Eric Owhadi > Labels: performance > Fix For: 2.1-incubating > > > ClientScanner API is serial, to conserve key ordering. However, many > operators don't care about ordering and would rather get the scan result > fast, regardless of order. This JIRA is about providing a parallel scanner, > that would take care of splitting the work between all region servers evenly > if possible. HBase had a parallel scanner in the pipe for quite some time > HBAse-9272, but the work is stalled since october 2013. However, looking at > the available code, look like a big part can be leveraged without requiring > an HBase custom build. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (TRAFODION-2009) parallel scanner failing on tpcds.store_sales table
[ https://issues.apache.org/jira/browse/TRAFODION-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Owhadi closed TRAFODION-2009. -- > parallel scanner failing on tpcds.store_sales table > --- > > Key: TRAFODION-2009 > URL: https://issues.apache.org/jira/browse/TRAFODION-2009 > Project: Apache Trafodion > Issue Type: Bug > Components: sql-exe >Affects Versions: 2.1-incubating > Environment: on dev workstation >Reporter: Eric Owhadi >Assignee: Eric Owhadi >Priority: Minor > Fix For: 2.1-incubating > > Original Estimate: 24h > Remaining Estimate: 24h > > CREATE TABLE TRAFODION.SEABASE.STORE_SALES > ( > SS_SOLD_DATE_SK INT DEFAULT NULL NOT SERIALIZED > , SS_ITEM_SK INT NO DEFAULT NOT NULL NOT DROPPABLE NOT > SERIALIZED > , SS_TICKET_NUMBER INT NO DEFAULT NOT NULL NOT DROPPABLE NOT > SERIALIZED > , SS_SOLD_TIME_SK INT DEFAULT NULL NOT SERIALIZED > , SS_CUSTOMER_SK INT DEFAULT NULL NOT SERIALIZED > , SS_CDEMO_SK INT DEFAULT NULL NOT SERIALIZED > , SS_HDEMO_SK INT DEFAULT NULL NOT SERIALIZED > , SS_ADDR_SK INT DEFAULT NULL NOT SERIALIZED > , SS_STORE_SK INT DEFAULT NULL NOT SERIALIZED > , SS_PROMO_SK INT DEFAULT NULL NOT SERIALIZED > , SS_QUANTITY INT DEFAULT NULL NOT SERIALIZED > , SS_WHOLESALE_COSTREAL DEFAULT NULL NOT SERIALIZED > , SS_LIST_PRICEREAL DEFAULT NULL NOT SERIALIZED > , SS_SALES_PRICE REAL DEFAULT NULL NOT SERIALIZED > , SS_EXT_DISCOUNT_AMT REAL DEFAULT NULL NOT SERIALIZED > , SS_EXT_SALES_PRICE REAL DEFAULT NULL NOT SERIALIZED > , SS_EXT_WHOLESALE_COSTREAL DEFAULT NULL NOT SERIALIZED > , SS_EXT_LIST_PRICEREAL DEFAULT NULL NOT SERIALIZED > , SS_EXT_TAX REAL DEFAULT NULL NOT SERIALIZED > , SS_COUPON_AMTREAL DEFAULT NULL NOT SERIALIZED > , SS_NET_PAID REAL DEFAULT NULL NOT SERIALIZED > , SS_NET_PAID_INC_TAX REAL DEFAULT NULL NOT SERIALIZED > , SS_NET_PROFITREAL DEFAULT NULL NOT SERIALIZED > , PRIMARY KEY (SS_SOLD_DATE_SK ASC, SS_ITEM_SK ASC, SS_TICKET_NUMBER ASC) > ) > SALT USING 8 PARTITIONS >ON (SS_ITEM_SK, SS_TICKET_NUMBER) > ATTRIBUTES ALIGNED FORMAT > HBASE_OPTIONS > ( > DATA_BLOCK_ENCODING = 'FAST_DIFF', > BLOCKSIZE = '131072' > ) > ; > load into store_sales select >SS_SOLD_DATE_SK > , SS_ITEM_SK > , SS_TICKET_NUMBER > , SS_SOLD_TIME_SK > , SS_CUSTOMER_SK > , SS_CDEMO_SK > , SS_HDEMO_SK > , SS_ADDR_SK > , SS_STORE_SK > , SS_PROMO_SK > , SS_QUANTITY > , SS_WHOLESALE_COST > , SS_LIST_PRICE > , SS_SALES_PRICE > , SS_EXT_DISCOUNT_AMT > , SS_EXT_SALES_PRICE > , SS_EXT_WHOLESALE_COST > , SS_EXT_LIST_PRICE > , SS_EXT_TAX > , SS_COUPON_AMT > , SS_NET_PAID > , SS_NET_PAID_INC_TAX > , SS_NET_PROFIT > from hive.hive.store_sales; > set statistics on; > cqd parallel_num_esps '1'; > cqd hbase_dop_parallel_scanner '1.0'; > prepare xx from select count(*) from store_sales where ss_customer_sk between > 1000 and 2; > execute xx; > the result will return wrong count. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (TRAFODION-2009) parallel scanner failing on tpcds.store_sales table
[ https://issues.apache.org/jira/browse/TRAFODION-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Owhadi resolved TRAFODION-2009. Resolution: Fixed Fix Version/s: 2.1-incubating > parallel scanner failing on tpcds.store_sales table > --- > > Key: TRAFODION-2009 > URL: https://issues.apache.org/jira/browse/TRAFODION-2009 > Project: Apache Trafodion > Issue Type: Bug > Components: sql-exe >Affects Versions: 2.1-incubating > Environment: on dev workstation >Reporter: Eric Owhadi >Assignee: Eric Owhadi >Priority: Minor > Fix For: 2.1-incubating > > Original Estimate: 24h > Remaining Estimate: 24h > > CREATE TABLE TRAFODION.SEABASE.STORE_SALES > ( > SS_SOLD_DATE_SK INT DEFAULT NULL NOT SERIALIZED > , SS_ITEM_SK INT NO DEFAULT NOT NULL NOT DROPPABLE NOT > SERIALIZED > , SS_TICKET_NUMBER INT NO DEFAULT NOT NULL NOT DROPPABLE NOT > SERIALIZED > , SS_SOLD_TIME_SK INT DEFAULT NULL NOT SERIALIZED > , SS_CUSTOMER_SK INT DEFAULT NULL NOT SERIALIZED > , SS_CDEMO_SK INT DEFAULT NULL NOT SERIALIZED > , SS_HDEMO_SK INT DEFAULT NULL NOT SERIALIZED > , SS_ADDR_SK INT DEFAULT NULL NOT SERIALIZED > , SS_STORE_SK INT DEFAULT NULL NOT SERIALIZED > , SS_PROMO_SK INT DEFAULT NULL NOT SERIALIZED > , SS_QUANTITY INT DEFAULT NULL NOT SERIALIZED > , SS_WHOLESALE_COSTREAL DEFAULT NULL NOT SERIALIZED > , SS_LIST_PRICEREAL DEFAULT NULL NOT SERIALIZED > , SS_SALES_PRICE REAL DEFAULT NULL NOT SERIALIZED > , SS_EXT_DISCOUNT_AMT REAL DEFAULT NULL NOT SERIALIZED > , SS_EXT_SALES_PRICE REAL DEFAULT NULL NOT SERIALIZED > , SS_EXT_WHOLESALE_COSTREAL DEFAULT NULL NOT SERIALIZED > , SS_EXT_LIST_PRICEREAL DEFAULT NULL NOT SERIALIZED > , SS_EXT_TAX REAL DEFAULT NULL NOT SERIALIZED > , SS_COUPON_AMTREAL DEFAULT NULL NOT SERIALIZED > , SS_NET_PAID REAL DEFAULT NULL NOT SERIALIZED > , SS_NET_PAID_INC_TAX REAL DEFAULT NULL NOT SERIALIZED > , SS_NET_PROFITREAL DEFAULT NULL NOT SERIALIZED > , PRIMARY KEY (SS_SOLD_DATE_SK ASC, SS_ITEM_SK ASC, SS_TICKET_NUMBER ASC) > ) > SALT USING 8 PARTITIONS >ON (SS_ITEM_SK, SS_TICKET_NUMBER) > ATTRIBUTES ALIGNED FORMAT > HBASE_OPTIONS > ( > DATA_BLOCK_ENCODING = 'FAST_DIFF', > BLOCKSIZE = '131072' > ) > ; > load into store_sales select >SS_SOLD_DATE_SK > , SS_ITEM_SK > , SS_TICKET_NUMBER > , SS_SOLD_TIME_SK > , SS_CUSTOMER_SK > , SS_CDEMO_SK > , SS_HDEMO_SK > , SS_ADDR_SK > , SS_STORE_SK > , SS_PROMO_SK > , SS_QUANTITY > , SS_WHOLESALE_COST > , SS_LIST_PRICE > , SS_SALES_PRICE > , SS_EXT_DISCOUNT_AMT > , SS_EXT_SALES_PRICE > , SS_EXT_WHOLESALE_COST > , SS_EXT_LIST_PRICE > , SS_EXT_TAX > , SS_COUPON_AMT > , SS_NET_PAID > , SS_NET_PAID_INC_TAX > , SS_NET_PROFIT > from hive.hive.store_sales; > set statistics on; > cqd parallel_num_esps '1'; > cqd hbase_dop_parallel_scanner '1.0'; > prepare xx from select count(*) from store_sales where ss_customer_sk between > 1000 and 2; > execute xx; > the result will return wrong count. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TRAFODION-2009) parallel scanner failing on tpcds.store_sales table
[ https://issues.apache.org/jira/browse/TRAFODION-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297190#comment-15297190 ] Eric Owhadi commented on TRAFODION-2009: set severity to minor because parallel_scanner is experimental feature disabled by default. > parallel scanner failing on tpcds.store_sales table > --- > > Key: TRAFODION-2009 > URL: https://issues.apache.org/jira/browse/TRAFODION-2009 > Project: Apache Trafodion > Issue Type: Bug > Components: sql-exe >Affects Versions: 2.1-incubating > Environment: on dev workstation >Reporter: Eric Owhadi >Assignee: Eric Owhadi >Priority: Minor > Original Estimate: 24h > Remaining Estimate: 24h > > CREATE TABLE TRAFODION.SEABASE.STORE_SALES > ( > SS_SOLD_DATE_SK INT DEFAULT NULL NOT SERIALIZED > , SS_ITEM_SK INT NO DEFAULT NOT NULL NOT DROPPABLE NOT > SERIALIZED > , SS_TICKET_NUMBER INT NO DEFAULT NOT NULL NOT DROPPABLE NOT > SERIALIZED > , SS_SOLD_TIME_SK INT DEFAULT NULL NOT SERIALIZED > , SS_CUSTOMER_SK INT DEFAULT NULL NOT SERIALIZED > , SS_CDEMO_SK INT DEFAULT NULL NOT SERIALIZED > , SS_HDEMO_SK INT DEFAULT NULL NOT SERIALIZED > , SS_ADDR_SK INT DEFAULT NULL NOT SERIALIZED > , SS_STORE_SK INT DEFAULT NULL NOT SERIALIZED > , SS_PROMO_SK INT DEFAULT NULL NOT SERIALIZED > , SS_QUANTITY INT DEFAULT NULL NOT SERIALIZED > , SS_WHOLESALE_COSTREAL DEFAULT NULL NOT SERIALIZED > , SS_LIST_PRICEREAL DEFAULT NULL NOT SERIALIZED > , SS_SALES_PRICE REAL DEFAULT NULL NOT SERIALIZED > , SS_EXT_DISCOUNT_AMT REAL DEFAULT NULL NOT SERIALIZED > , SS_EXT_SALES_PRICE REAL DEFAULT NULL NOT SERIALIZED > , SS_EXT_WHOLESALE_COSTREAL DEFAULT NULL NOT SERIALIZED > , SS_EXT_LIST_PRICEREAL DEFAULT NULL NOT SERIALIZED > , SS_EXT_TAX REAL DEFAULT NULL NOT SERIALIZED > , SS_COUPON_AMTREAL DEFAULT NULL NOT SERIALIZED > , SS_NET_PAID REAL DEFAULT NULL NOT SERIALIZED > , SS_NET_PAID_INC_TAX REAL DEFAULT NULL NOT SERIALIZED > , SS_NET_PROFITREAL DEFAULT NULL NOT SERIALIZED > , PRIMARY KEY (SS_SOLD_DATE_SK ASC, SS_ITEM_SK ASC, SS_TICKET_NUMBER ASC) > ) > SALT USING 8 PARTITIONS >ON (SS_ITEM_SK, SS_TICKET_NUMBER) > ATTRIBUTES ALIGNED FORMAT > HBASE_OPTIONS > ( > DATA_BLOCK_ENCODING = 'FAST_DIFF', > BLOCKSIZE = '131072' > ) > ; > load into store_sales select >SS_SOLD_DATE_SK > , SS_ITEM_SK > , SS_TICKET_NUMBER > , SS_SOLD_TIME_SK > , SS_CUSTOMER_SK > , SS_CDEMO_SK > , SS_HDEMO_SK > , SS_ADDR_SK > , SS_STORE_SK > , SS_PROMO_SK > , SS_QUANTITY > , SS_WHOLESALE_COST > , SS_LIST_PRICE > , SS_SALES_PRICE > , SS_EXT_DISCOUNT_AMT > , SS_EXT_SALES_PRICE > , SS_EXT_WHOLESALE_COST > , SS_EXT_LIST_PRICE > , SS_EXT_TAX > , SS_COUPON_AMT > , SS_NET_PAID > , SS_NET_PAID_INC_TAX > , SS_NET_PROFIT > from hive.hive.store_sales; > set statistics on; > cqd parallel_num_esps '1'; > cqd hbase_dop_parallel_scanner '1.0'; > prepare xx from select count(*) from store_sales where ss_customer_sk between > 1000 and 2; > execute xx; > the result will return wrong count. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TRAFODION-2009) parallel scanner failing on tpcds.store_sales table
Eric Owhadi created TRAFODION-2009: -- Summary: parallel scanner failing on tpcds.store_sales table Key: TRAFODION-2009 URL: https://issues.apache.org/jira/browse/TRAFODION-2009 Project: Apache Trafodion Issue Type: Bug Components: sql-exe Affects Versions: 2.1-incubating Environment: on dev workstation Reporter: Eric Owhadi Assignee: Eric Owhadi Priority: Minor CREATE TABLE TRAFODION.SEABASE.STORE_SALES ( SS_SOLD_DATE_SK INT DEFAULT NULL NOT SERIALIZED , SS_ITEM_SK INT NO DEFAULT NOT NULL NOT DROPPABLE NOT SERIALIZED , SS_TICKET_NUMBER INT NO DEFAULT NOT NULL NOT DROPPABLE NOT SERIALIZED , SS_SOLD_TIME_SK INT DEFAULT NULL NOT SERIALIZED , SS_CUSTOMER_SK INT DEFAULT NULL NOT SERIALIZED , SS_CDEMO_SK INT DEFAULT NULL NOT SERIALIZED , SS_HDEMO_SK INT DEFAULT NULL NOT SERIALIZED , SS_ADDR_SK INT DEFAULT NULL NOT SERIALIZED , SS_STORE_SK INT DEFAULT NULL NOT SERIALIZED , SS_PROMO_SK INT DEFAULT NULL NOT SERIALIZED , SS_QUANTITY INT DEFAULT NULL NOT SERIALIZED , SS_WHOLESALE_COSTREAL DEFAULT NULL NOT SERIALIZED , SS_LIST_PRICEREAL DEFAULT NULL NOT SERIALIZED , SS_SALES_PRICE REAL DEFAULT NULL NOT SERIALIZED , SS_EXT_DISCOUNT_AMT REAL DEFAULT NULL NOT SERIALIZED , SS_EXT_SALES_PRICE REAL DEFAULT NULL NOT SERIALIZED , SS_EXT_WHOLESALE_COSTREAL DEFAULT NULL NOT SERIALIZED , SS_EXT_LIST_PRICEREAL DEFAULT NULL NOT SERIALIZED , SS_EXT_TAX REAL DEFAULT NULL NOT SERIALIZED , SS_COUPON_AMTREAL DEFAULT NULL NOT SERIALIZED , SS_NET_PAID REAL DEFAULT NULL NOT SERIALIZED , SS_NET_PAID_INC_TAX REAL DEFAULT NULL NOT SERIALIZED , SS_NET_PROFITREAL DEFAULT NULL NOT SERIALIZED , PRIMARY KEY (SS_SOLD_DATE_SK ASC, SS_ITEM_SK ASC, SS_TICKET_NUMBER ASC) ) SALT USING 8 PARTITIONS ON (SS_ITEM_SK, SS_TICKET_NUMBER) ATTRIBUTES ALIGNED FORMAT HBASE_OPTIONS ( DATA_BLOCK_ENCODING = 'FAST_DIFF', BLOCKSIZE = '131072' ) ; load into store_sales select SS_SOLD_DATE_SK , SS_ITEM_SK , SS_TICKET_NUMBER , SS_SOLD_TIME_SK , SS_CUSTOMER_SK , SS_CDEMO_SK , SS_HDEMO_SK , SS_ADDR_SK , SS_STORE_SK , SS_PROMO_SK , SS_QUANTITY , SS_WHOLESALE_COST , SS_LIST_PRICE , SS_SALES_PRICE , SS_EXT_DISCOUNT_AMT , SS_EXT_SALES_PRICE , SS_EXT_WHOLESALE_COST , SS_EXT_LIST_PRICE , SS_EXT_TAX , SS_COUPON_AMT , SS_NET_PAID , SS_NET_PAID_INC_TAX , SS_NET_PROFIT from hive.hive.store_sales; set statistics on; cqd parallel_num_esps '1'; cqd hbase_dop_parallel_scanner '1.0'; prepare xx from select count(*) from store_sales where ss_customer_sk between 1000 and 2; execute xx; the result will return wrong count. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TRAFODION-1914) optimize "added columns" in indexes
[ https://issues.apache.org/jira/browse/TRAFODION-1914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15289108#comment-15289108 ] Eric Owhadi commented on TRAFODION-1914: Hi Ming, No it is different thing. In my JIRA, there is no limitation of it being only for unique index. Hope that makes sense, Eric > optimize "added columns" in indexes > --- > > Key: TRAFODION-1914 > URL: https://issues.apache.org/jira/browse/TRAFODION-1914 > Project: Apache Trafodion > Issue Type: Improvement > Components: sql-cmp >Reporter: Eric Owhadi > > the current CREATE INDEX feature will always put each column added to the > index in the clustering key. But sometimes, users just want to add columns to > the index to avoid having to probe back the primary table to fetch just one > or 2 column back. Instead copying these columns in the index can avoid making > a probe back to main table and therefore improve performance. Current > implementation allows this, but will always put the extra column as part of > the clustering key. That is not optimal, and very bad for the case of > VARCHAR, since they are exploded to there max size when part of the > clustering key. So this JIRA is abount altering the syntax of create index, > and flag columns that are added but should not be part of the clustering key. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TRAFODION-1914) optimize "added columns" in indexes
Eric Owhadi created TRAFODION-1914: -- Summary: optimize "added columns" in indexes Key: TRAFODION-1914 URL: https://issues.apache.org/jira/browse/TRAFODION-1914 Project: Apache Trafodion Issue Type: Improvement Components: sql-cmp Reporter: Eric Owhadi the current CREATE INDEX feature will always put each column added to the index in the clustering key. But sometimes, users just want to add columns to the index to avoid having to probe back the primary table to fetch just one or 2 column back. Instead copying these columns in the index can avoid making a probe back to main table and therefore improve performance. Current implementation allows this, but will always put the extra column as part of the clustering key. That is not optimal, and very bad for the case of VARCHAR, since they are exploded to there max size when part of the clustering key. So this JIRA is abount altering the syntax of create index, and flag columns that are added but should not be part of the clustering key. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (TRAFODION-1900) Optimize MDAM scans with small scanner
[ https://issues.apache.org/jira/browse/TRAFODION-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Owhadi closed TRAFODION-1900. -- > Optimize MDAM scans with small scanner > -- > > Key: TRAFODION-1900 > URL: https://issues.apache.org/jira/browse/TRAFODION-1900 > Project: Apache Trafodion > Issue Type: Improvement > Components: sql-exe >Reporter: Eric Owhadi >Assignee: Eric Owhadi > Fix For: 2.0-incubating > > > When doing MDAM scans, we are performing interlaced scan for PROBE and for > real scan. The probes always return only 1 row, then we close the scanner > immediately, therefore should use always small scanner. I will make it > conditional on the existing CQD HBASE_SMALL_SCANNER (ether SYSTEM or ON). In > addition, caching of blocks retrieved by probe should always we at least > receiving one succesfull cache hit on the next MDAM scan, therefore forcing > caching ON for MDAM prob is a good idea. Again, will make this forcing > conditional on HBASE_SMALL_SCANNER SYSTEM or ON. > Then for the real scan part of MDAM, I will use the following heuristic: If > previous scan fitted in one hbase block, then it is likelly than next will > also fit in one hbase block, therefore enable small scanner for next scan. > Again all this only if CQD above is ON or SYSTEM. > Results of using small scanner on MDAM when it make sense showed a 1.39X > speed improvement... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (TRAFODION-1900) Optimize MDAM scans with small scanner
[ https://issues.apache.org/jira/browse/TRAFODION-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Owhadi resolved TRAFODION-1900. Resolution: Fixed f > Optimize MDAM scans with small scanner > -- > > Key: TRAFODION-1900 > URL: https://issues.apache.org/jira/browse/TRAFODION-1900 > Project: Apache Trafodion > Issue Type: Improvement > Components: sql-exe >Reporter: Eric Owhadi >Assignee: Eric Owhadi > Fix For: 2.0-incubating > > > When doing MDAM scans, we are performing interlaced scan for PROBE and for > real scan. The probes always return only 1 row, then we close the scanner > immediately, therefore should use always small scanner. I will make it > conditional on the existing CQD HBASE_SMALL_SCANNER (ether SYSTEM or ON). In > addition, caching of blocks retrieved by probe should always we at least > receiving one succesfull cache hit on the next MDAM scan, therefore forcing > caching ON for MDAM prob is a good idea. Again, will make this forcing > conditional on HBASE_SMALL_SCANNER SYSTEM or ON. > Then for the real scan part of MDAM, I will use the following heuristic: If > previous scan fitted in one hbase block, then it is likelly than next will > also fit in one hbase block, therefore enable small scanner for next scan. > Again all this only if CQD above is ON or SYSTEM. > Results of using small scanner on MDAM when it make sense showed a 1.39X > speed improvement... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TRAFODION-1900) Optimize MDAM scans with small scanner
[ https://issues.apache.org/jira/browse/TRAFODION-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Owhadi updated TRAFODION-1900: --- Fix Version/s: 2.0-incubating > Optimize MDAM scans with small scanner > -- > > Key: TRAFODION-1900 > URL: https://issues.apache.org/jira/browse/TRAFODION-1900 > Project: Apache Trafodion > Issue Type: Improvement > Components: sql-exe >Reporter: Eric Owhadi >Assignee: Eric Owhadi > Fix For: 2.0-incubating > > > When doing MDAM scans, we are performing interlaced scan for PROBE and for > real scan. The probes always return only 1 row, then we close the scanner > immediately, therefore should use always small scanner. I will make it > conditional on the existing CQD HBASE_SMALL_SCANNER (ether SYSTEM or ON). In > addition, caching of blocks retrieved by probe should always we at least > receiving one succesfull cache hit on the next MDAM scan, therefore forcing > caching ON for MDAM prob is a good idea. Again, will make this forcing > conditional on HBASE_SMALL_SCANNER SYSTEM or ON. > Then for the real scan part of MDAM, I will use the following heuristic: If > previous scan fitted in one hbase block, then it is likelly than next will > also fit in one hbase block, therefore enable small scanner for next scan. > Again all this only if CQD above is ON or SYSTEM. > Results of using small scanner on MDAM when it make sense showed a 1.39X > speed improvement... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TRAFODION-1900) Optimize MDAM scans with small scanner
Eric Owhadi created TRAFODION-1900: -- Summary: Optimize MDAM scans with small scanner Key: TRAFODION-1900 URL: https://issues.apache.org/jira/browse/TRAFODION-1900 Project: Apache Trafodion Issue Type: Improvement Components: sql-exe Reporter: Eric Owhadi Assignee: Eric Owhadi When doing MDAM scans, we are performing interlaced scan for PROBE and for real scan. The probes always return only 1 row, then we close the scanner immediately, therefore should use always small scanner. I will make it conditional on the existing CQD HBASE_SMALL_SCANNER (ether SYSTEM or ON). In addition, caching of blocks retrieved by probe should always we at least receiving one succesfull cache hit on the next MDAM scan, therefore forcing caching ON for MDAM prob is a good idea. Again, will make this forcing conditional on HBASE_SMALL_SCANNER SYSTEM or ON. Then for the real scan part of MDAM, I will use the following heuristic: If previous scan fitted in one hbase block, then it is likelly than next will also fit in one hbase block, therefore enable small scanner for next scan. Again all this only if CQD above is ON or SYSTEM. Results of using small scanner on MDAM when it make sense showed a 1.39X speed improvement... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TRAFODION-1863) With hbase_filter_preds set to '2', wrong results are returned for a specific use case.
[ https://issues.apache.org/jira/browse/TRAFODION-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15191242#comment-15191242 ] Eric Owhadi commented on TRAFODION-1863: this closes JIRA 1863 > With hbase_filter_preds set to '2', wrong results are returned for a specific > use case. > --- > > Key: TRAFODION-1863 > URL: https://issues.apache.org/jira/browse/TRAFODION-1863 > Project: Apache Trafodion > Issue Type: Bug >Reporter: Selvaganesan Govindarajan >Assignee: Eric Owhadi > > create table t056t57 (a1 numeric(2,2) signed default 0 not null); > showddl t056t57; > insert into t056t57 default values; > select * from t056t57; > >>select * from t056t57 ; > A1 > --- > .00 > --- 1 row(s) selected. > >>cqd hbase_filter_preds '2' ; > --- SQL operation complete. > >>select * from t056t57 ; > .. > --- 0 row(s) selected. > >> > This was causing core/TEST056 to fail with PR #340.. Possibly similar issue > is with core/TEST029 too, Currently this test case runs with > hbase_filter_preds set to 'ON' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TRAFODION-1863) With hbase_filter_preds set to '2', wrong results are returned for a specific use case.
[ https://issues.apache.org/jira/browse/TRAFODION-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Owhadi updated TRAFODION-1863: --- fix with PR364 > With hbase_filter_preds set to '2', wrong results are returned for a specific > use case. > --- > > Key: TRAFODION-1863 > URL: https://issues.apache.org/jira/browse/TRAFODION-1863 > Project: Apache Trafodion > Issue Type: Bug >Reporter: Selvaganesan Govindarajan >Assignee: Eric Owhadi > > create table t056t57 (a1 numeric(2,2) signed default 0 not null); > showddl t056t57; > insert into t056t57 default values; > select * from t056t57; > >>select * from t056t57 ; > A1 > --- > .00 > --- 1 row(s) selected. > >>cqd hbase_filter_preds '2' ; > --- SQL operation complete. > >>select * from t056t57 ; > .. > --- 0 row(s) selected. > >> > This was causing core/TEST056 to fail with PR #340.. Possibly similar issue > is with core/TEST029 too, Currently this test case runs with > hbase_filter_preds set to 'ON' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (TRAFODION-1877) CORE/TEST131 failed when run in standalone
[ https://issues.apache.org/jira/browse/TRAFODION-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Owhadi closed TRAFODION-1877. -- Resolution: Not A Problem was the dependency on first running CORE/TEST000 to create the prerequisits > CORE/TEST131 failed when run in standalone > -- > > Key: TRAFODION-1877 > URL: https://issues.apache.org/jira/browse/TRAFODION-1877 > Project: Apache Trafodion > Issue Type: Bug > Environment: trafodion master pulled on March 4th 2016 >Reporter: Eric Owhadi > > $scriptsdir/core/runregr -sb TEST131 > getting this result: > 18c18,20 > < --- SQL operation complete. > --- > > *** ERROR[1390] Object TRAFODION.TRAFODION.T131A already exists in > > TRAFODION. > > > > --- SQL operation failed with errors. > 30c32,34 > < --- SQL operation complete. > --- > > *** ERROR[1390] Object TRAFODION.TRAFODION.T131B already exists in > > TRAFODION. > > > > --- SQL operation failed with errors. > 42c46,48 > < --- SQL operation complete. > --- > > *** ERROR[1390] Object TRAFODION.TRAFODION.T131C already exists in > > TRAFODION. > > > > --- SQL operation failed with errors. > 46c52,54 > < --- SQL operation complete. > --- > > *** ERROR[1222] Command not supported when authorization is not enabled. > > > > --- SQL operation failed with errors. > 49c57,59 > < --- SQL operation complete. > --- > > *** ERROR[1222] Command not supported when authorization is not enabled. > > > > --- SQL operation failed with errors. > 52c62,64 > < --- SQL operation complete. > --- > > *** ERROR[1222] Command not supported when authorization is not enabled. > > > > --- SQL operation failed with errors. > 71d82 > < Query_Invalidation_Keys { > 79c90,92 > < --- SQL operation complete. > --- > > *** ERROR[1222] Command not supported when authorization is not enabled. > > > > --- SQL operation failed with errors. > 84c97,99 > < --- SQL operation complete. > --- > > *** ERROR[1222] Command not supported when authorization is not enabled. > > > > --- SQL operation failed with errors. > 87c102,104 > < --- SQL operation complete. > --- > > *** ERROR[1222] Command not supported when authorization is not enabled. > > > > --- SQL operation failed with errors. > 90c107,109 > < --- SQL operation complete. > --- > > *** ERROR[1222] Command not supported when authorization is not enabled. > > > > --- SQL operation failed with errors. > 94c113,115 > < --- SQL operation complete. > --- > > *** ERROR[1222] Command not supported when authorization is not enabled. > > > > --- SQL operation failed with errors. > 97c118,120 > < --- SQL operation complete. > --- > > *** ERROR[1222] Command not supported when authorization is not enabled. > > > > --- SQL operation failed with errors. > 100c123,125 > < --- SQL operation complete. > --- > > *** ERROR[1222] Command not supported when authorization is not enabled. > > > > --- SQL operation failed with errors. > 107c132,134 > < --- SQL operation complete. > --- > > *** ERROR[1222] Command not supported when authorization is not enabled. > > > > --- SQL operation failed with errors. > 110c137,139 > < --- SQL operation complete. > --- > > *** ERROR[1222] Command not supported when authorization is not enabled. > > > > --- SQL operation failed with errors. > 113c142,144 > < --- SQL operation complete. > --- > > *** ERROR[1222] Command not supported when authorization is not enabled. > > > > --- SQL operation failed with errors. > 117c148,150 > < --- SQL operation complete. > --- > > *** ERROR[1222] Command not supported when authorization is not enabled. > > > > --- SQL operation failed with errors. > 120c153,155 > < --- SQL operation complete. > --- > > *** ERROR[1222] Command not supported when authorization is not enabled. > > > > --- SQL operation failed with errors. > 123c158,160 > < --- SQL operation complete. > --- > > *** ERROR[1222] Command not supported when authorization is not enabled. > > > > --- SQL operation failed with errors. > 129,132d165 > < *** ERROR[4481] The user does not have SELECT privilege on table or view > #CAT.#SCH.T131C. > < > < *** ERROR[8822] The statement was not prepared. > < > 149c182,184 > < --- SQL operation complete. > --- > > *** ERROR[1222] Command not supported when authorization is not enabled. > > > > --- SQL operation failed with errors. > 152c187,189 > < --- 1 row(s) inserted. > --- > > *** ERROR[8102] The operation is prevented by a unique constraint. > > > > --- 0 row(s) inserted. > 177c214,216 > < --- SQL operation complete. > --- > > *** ERROR[1222] Command not supported when authorization is not enabled. > > > > --- SQL operation failed with errors. > 181c220,222 > < --- SQL operation complete. > --- > > *** ERROR[1222] Command not supported when authorization is not enabled. > > > > --- SQL operation failed with errors. > 200c241 > < 1
[jira] [Closed] (TRAFODION-1876) CORE/TEST116 fails when run in standalone
[ https://issues.apache.org/jira/browse/TRAFODION-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Owhadi closed TRAFODION-1876. -- Resolution: Not A Bug CORE/TEST000 must be run to set some pre-requisit right > CORE/TEST116 fails when run in standalone > - > > Key: TRAFODION-1876 > URL: https://issues.apache.org/jira/browse/TRAFODION-1876 > Project: Apache Trafodion > Issue Type: Bug > Components: Build Infrastructure > Environment: using trafodion master of March 4th 2016 and running on > a development build machine. >Reporter: Eric Owhadi > > using this command to run test: > $scriptsdir/core/runregr -sb TEST116 > getting this diff116: > 454c454 > < *** ERROR[1431] Object #CAT.#SCH.T116T1 exists in HBase. This could be due > to a concurrent transactional ddl operation in progress on this table. > --- > > *** ERROR[1390] Object TRAFODION.TRAFODION.T116T1 already exists in > > TRAFODION. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TRAFODION-1876) CORE/TEST116 fails when run in standalone
[ https://issues.apache.org/jira/browse/TRAFODION-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180119#comment-15180119 ] Eric Owhadi edited comment on TRAFODION-1876 at 3/4/16 4:45 PM: OK, nice catch that was it, closing this JIRA... was (Author: eowhadi): did not check, is core/test000 supposed to be run when running all test? Because I got error running all test too. Is TEST000 supposed to create SCH and put it in default table? > CORE/TEST116 fails when run in standalone > - > > Key: TRAFODION-1876 > URL: https://issues.apache.org/jira/browse/TRAFODION-1876 > Project: Apache Trafodion > Issue Type: Bug > Components: Build Infrastructure > Environment: using trafodion master of March 4th 2016 and running on > a development build machine. >Reporter: Eric Owhadi > > using this command to run test: > $scriptsdir/core/runregr -sb TEST116 > getting this diff116: > 454c454 > < *** ERROR[1431] Object #CAT.#SCH.T116T1 exists in HBase. This could be due > to a concurrent transactional ddl operation in progress on this table. > --- > > *** ERROR[1390] Object TRAFODION.TRAFODION.T116T1 already exists in > > TRAFODION. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TRAFODION-1876) CORE/TEST116 fails when run in standalone
[ https://issues.apache.org/jira/browse/TRAFODION-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180119#comment-15180119 ] Eric Owhadi commented on TRAFODION-1876: did not check, is core/test000 supposed to be run when running all test? Because I got error running all test too. Is TEST000 supposed to create SCH and put it in default table? > CORE/TEST116 fails when run in standalone > - > > Key: TRAFODION-1876 > URL: https://issues.apache.org/jira/browse/TRAFODION-1876 > Project: Apache Trafodion > Issue Type: Bug > Components: Build Infrastructure > Environment: using trafodion master of March 4th 2016 and running on > a development build machine. >Reporter: Eric Owhadi > > using this command to run test: > $scriptsdir/core/runregr -sb TEST116 > getting this diff116: > 454c454 > < *** ERROR[1431] Object #CAT.#SCH.T116T1 exists in HBase. This could be due > to a concurrent transactional ddl operation in progress on this table. > --- > > *** ERROR[1390] Object TRAFODION.TRAFODION.T116T1 already exists in > > TRAFODION. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TRAFODION-1877) CORE/TEST131 failed when run in standalone
Eric Owhadi created TRAFODION-1877: -- Summary: CORE/TEST131 failed when run in standalone Key: TRAFODION-1877 URL: https://issues.apache.org/jira/browse/TRAFODION-1877 Project: Apache Trafodion Issue Type: Bug Environment: trafodion master pulled on March 4th 2016 Reporter: Eric Owhadi $scriptsdir/core/runregr -sb TEST131 getting this result: 18c18,20 < --- SQL operation complete. --- > *** ERROR[1390] Object TRAFODION.TRAFODION.T131A already exists in TRAFODION. > > --- SQL operation failed with errors. 30c32,34 < --- SQL operation complete. --- > *** ERROR[1390] Object TRAFODION.TRAFODION.T131B already exists in TRAFODION. > > --- SQL operation failed with errors. 42c46,48 < --- SQL operation complete. --- > *** ERROR[1390] Object TRAFODION.TRAFODION.T131C already exists in TRAFODION. > > --- SQL operation failed with errors. 46c52,54 < --- SQL operation complete. --- > *** ERROR[1222] Command not supported when authorization is not enabled. > > --- SQL operation failed with errors. 49c57,59 < --- SQL operation complete. --- > *** ERROR[1222] Command not supported when authorization is not enabled. > > --- SQL operation failed with errors. 52c62,64 < --- SQL operation complete. --- > *** ERROR[1222] Command not supported when authorization is not enabled. > > --- SQL operation failed with errors. 71d82 < Query_Invalidation_Keys { 79c90,92 < --- SQL operation complete. --- > *** ERROR[1222] Command not supported when authorization is not enabled. > > --- SQL operation failed with errors. 84c97,99 < --- SQL operation complete. --- > *** ERROR[1222] Command not supported when authorization is not enabled. > > --- SQL operation failed with errors. 87c102,104 < --- SQL operation complete. --- > *** ERROR[1222] Command not supported when authorization is not enabled. > > --- SQL operation failed with errors. 90c107,109 < --- SQL operation complete. --- > *** ERROR[1222] Command not supported when authorization is not enabled. > > --- SQL operation failed with errors. 94c113,115 < --- SQL operation complete. --- > *** ERROR[1222] Command not supported when authorization is not enabled. > > --- SQL operation failed with errors. 97c118,120 < --- SQL operation complete. --- > *** ERROR[1222] Command not supported when authorization is not enabled. > > --- SQL operation failed with errors. 100c123,125 < --- SQL operation complete. --- > *** ERROR[1222] Command not supported when authorization is not enabled. > > --- SQL operation failed with errors. 107c132,134 < --- SQL operation complete. --- > *** ERROR[1222] Command not supported when authorization is not enabled. > > --- SQL operation failed with errors. 110c137,139 < --- SQL operation complete. --- > *** ERROR[1222] Command not supported when authorization is not enabled. > > --- SQL operation failed with errors. 113c142,144 < --- SQL operation complete. --- > *** ERROR[1222] Command not supported when authorization is not enabled. > > --- SQL operation failed with errors. 117c148,150 < --- SQL operation complete. --- > *** ERROR[1222] Command not supported when authorization is not enabled. > > --- SQL operation failed with errors. 120c153,155 < --- SQL operation complete. --- > *** ERROR[1222] Command not supported when authorization is not enabled. > > --- SQL operation failed with errors. 123c158,160 < --- SQL operation complete. --- > *** ERROR[1222] Command not supported when authorization is not enabled. > > --- SQL operation failed with errors. 129,132d165 < *** ERROR[4481] The user does not have SELECT privilege on table or view #CAT.#SCH.T131C. < < *** ERROR[8822] The statement was not prepared. < 149c182,184 < --- SQL operation complete. --- > *** ERROR[1222] Command not supported when authorization is not enabled. > > --- SQL operation failed with errors. 152c187,189 < --- 1 row(s) inserted. --- > *** ERROR[8102] The operation is prevented by a unique constraint. > > --- 0 row(s) inserted. 177c214,216 < --- SQL operation complete. --- > *** ERROR[1222] Command not supported when authorization is not enabled. > > --- SQL operation failed with errors. 181c220,222 < --- SQL operation complete. --- > *** ERROR[1222] Command not supported when authorization is not enabled. > > --- SQL operation failed with errors. 200c241 < 1 1 1 1 1 1 --- > 1 45 1 1 1 1 205,208d245 < *** WARNING[8597] Statement was automatically retried 1 time(s). Delay before each retry was 0 seconds. See next entry for the error that caused this retry. < < *** WARNING[8734] Statement must be recompiled to allow privileges to be re-evaluated. < 218c255 < 1 23 1 1 1 1 --- > 1 67 1 1 1 1 236c273,275 < --- SQL operation complete. --- > *** ERROR[1222] Command not supported when authorization is not enabled. > > --- SQL operation failed with errors. 254c293 < 1 23 1 1 1 1 --- > 1 67 1 1 1 1 259,267c298 < *** ERROR[4481]
[jira] [Created] (TRAFODION-1876) CORE/TEST116 fails when run in standalone
Eric Owhadi created TRAFODION-1876: -- Summary: CORE/TEST116 fails when run in standalone Key: TRAFODION-1876 URL: https://issues.apache.org/jira/browse/TRAFODION-1876 Project: Apache Trafodion Issue Type: Bug Components: Build Infrastructure Environment: using trafodion master of March 4th 2016 and running on a development build machine. Reporter: Eric Owhadi using this command to run test: $scriptsdir/core/runregr -sb TEST116 getting this diff116: 454c454 < *** ERROR[1431] Object #CAT.#SCH.T116T1 exists in HBase. This could be due to a concurrent transactional ddl operation in progress on this table. --- > *** ERROR[1390] Object TRAFODION.TRAFODION.T116T1 already exists in TRAFODION. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TRAFODION-1811) upsert failure when using nullable key columns
Eric Owhadi created TRAFODION-1811: -- Summary: upsert failure when using nullable key columns Key: TRAFODION-1811 URL: https://issues.apache.org/jira/browse/TRAFODION-1811 Project: Apache Trafodion Issue Type: Bug Components: sql-exe Reporter: Eric Owhadi upsert fails on predicate evaluation when a nullable column is used as PK. Various consecutive probably associated problem follows... see test bellow: >>create table t(a int, b int not null not droppable unique, primary key(a,b)); --- SQL operation complete. >>insert into t values(null,1); --- 1 row(s) inserted. >>select * from t; AB --- --- ?1 --- 1 row(s) selected. >>upsert into t values(null,2); *** ERROR[4099] A NULL operand is not allowed in predicate (TRAFODION.TPCDSGOOD.T.A = NULL). *** ERROR[8822] The statement was not prepared. >>upsert into t values(2,2); --- 1 row(s) inserted. >>upsert into t values(1,1); --- 1 row(s) inserted. >>select * from t; AB --- --- ?1 --- 1 row(s) selected. >>insert into t values (2,2); *** ERROR[8102] The operation is prevented by a unique constraint. --- 0 row(s) inserted. >>select * from t; AB --- --- ?1 --- 1 row(s) selected. >>select * from t where b=2; --- 0 row(s) selected. >>insert into t values (3,3); --- 1 row(s) inserted. >>select * from t; AB --- --- ?1 33 --- 2 row(s) selected. So 2 questions after this test: - Shouldn’t upsert try to use special null semantic instead of failing on predicate evaluation? - Look like upsert of 2,2 succeeded, but next select * does not show it… however, trying to insert (2,2) fails due to unique key constraint… so look like the upsert worked half way… - Why would the second upsert says there was an insert, but the next select * statement shows that neither insert or update was performed? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (TRAFODION-1420) Use ClientSmallScanner for small scans to improve performance
[ https://issues.apache.org/jira/browse/TRAFODION-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Owhadi closed TRAFODION-1420. -- > Use ClientSmallScanner for small scans to improve performance > - > > Key: TRAFODION-1420 > URL: https://issues.apache.org/jira/browse/TRAFODION-1420 > Project: Apache Trafodion > Issue Type: Improvement > Components: sql-cmp, sql-exe >Reporter: Eric Owhadi >Assignee: Eric Owhadi > Labels: performance > > Hbase implements an optimization for small scan (defined as scanning less > than a data block ie 64Kb) resulting in 3X performance improvement. The > underlying trick is about cutting down RPC calls from 3 (OpenScan/Next/Close) > to 1, and use pread stateless instead of seek/read state-full and locking > method to read data. This JIRA is about improving the compiler who can be > aware if a scan will be acting on single data block (small) or not, and pass > this data to executor so that it can use the right parameter for scan. > (scan.setSmall(boolean)). > reference: > https://issues.apache.org/jira/browse/HBASE-9488 > https://issues.apache.org/jira/browse/HBASE-7266 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (TRAFODION-1420) Use ClientSmallScanner for small scans to improve performance
[ https://issues.apache.org/jira/browse/TRAFODION-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Owhadi resolved TRAFODION-1420. Resolution: Fixed implemented and merged in PR 284 > Use ClientSmallScanner for small scans to improve performance > - > > Key: TRAFODION-1420 > URL: https://issues.apache.org/jira/browse/TRAFODION-1420 > Project: Apache Trafodion > Issue Type: Improvement > Components: sql-cmp, sql-exe >Reporter: Eric Owhadi >Assignee: Eric Owhadi > Labels: performance > Fix For: 2.0-incubating > > > Hbase implements an optimization for small scan (defined as scanning less > than a data block ie 64Kb) resulting in 3X performance improvement. The > underlying trick is about cutting down RPC calls from 3 (OpenScan/Next/Close) > to 1, and use pread stateless instead of seek/read state-full and locking > method to read data. This JIRA is about improving the compiler who can be > aware if a scan will be acting on single data block (small) or not, and pass > this data to executor so that it can use the right parameter for scan. > (scan.setSmall(boolean)). > reference: > https://issues.apache.org/jira/browse/HBASE-9488 > https://issues.apache.org/jira/browse/HBASE-7266 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TRAFODION-1420) Use ClientSmallScanner for small scans to improve performance
[ https://issues.apache.org/jira/browse/TRAFODION-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Owhadi updated TRAFODION-1420: --- Fix Version/s: (was: 2.0-incubating) > Use ClientSmallScanner for small scans to improve performance > - > > Key: TRAFODION-1420 > URL: https://issues.apache.org/jira/browse/TRAFODION-1420 > Project: Apache Trafodion > Issue Type: Improvement > Components: sql-cmp, sql-exe >Reporter: Eric Owhadi >Assignee: Eric Owhadi > Labels: performance > > Hbase implements an optimization for small scan (defined as scanning less > than a data block ie 64Kb) resulting in 3X performance improvement. The > underlying trick is about cutting down RPC calls from 3 (OpenScan/Next/Close) > to 1, and use pread stateless instead of seek/read state-full and locking > method to read data. This JIRA is about improving the compiler who can be > aware if a scan will be acting on single data block (small) or not, and pass > this data to executor so that it can use the right parameter for scan. > (scan.setSmall(boolean)). > reference: > https://issues.apache.org/jira/browse/HBASE-9488 > https://issues.apache.org/jira/browse/HBASE-7266 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TRAFODION-1771) TESTRTS fails
[ https://issues.apache.org/jira/browse/TRAFODION-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15118457#comment-15118457 ] Eric Owhadi commented on TRAFODION-1771: Hi Sandhya, I have a test pull request that I kept open for you that was showcasing the failure on Jenkins. If you need it let me know, else I can delete it. Cheers, Eric > TESTRTS fails > - > > Key: TRAFODION-1771 > URL: https://issues.apache.org/jira/browse/TRAFODION-1771 > Project: Apache Trafodion > Issue Type: Bug > Components: Build Infrastructure >Affects Versions: 2.0-incubating >Reporter: Eric Owhadi >Assignee: Selvaganesan Govindarajan > Attachments: corefiles.log > > > TESTRTS is failing with core dumped on PR255 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TRAFODION-1783) DIVISION BY feature not documented in SQL reference manual
Eric Owhadi created TRAFODION-1783: -- Summary: DIVISION BY feature not documented in SQL reference manual Key: TRAFODION-1783 URL: https://issues.apache.org/jira/browse/TRAFODION-1783 Project: Apache Trafodion Issue Type: Documentation Components: documentation Affects Versions: any Reporter: Eric Owhadi -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (TRAFODION-1662) Predicate push down revisited (V2)
[ https://issues.apache.org/jira/browse/TRAFODION-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Owhadi closed TRAFODION-1662. -- Resolution: Fixed PR-255 > Predicate push down revisited (V2) > -- > > Key: TRAFODION-1662 > URL: https://issues.apache.org/jira/browse/TRAFODION-1662 > Project: Apache Trafodion > Issue Type: Improvement > Components: sql-exe >Affects Versions: 2.0-incubating >Reporter: Eric Owhadi >Assignee: Eric Owhadi > Labels: predicate, pushdown > Attachments: Advanced predicate push down feature.docx, Advanced > predicate push down feature.docx, Performance results analyzing effects of > optimizations introduced in pushdown V2.docx > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Currently Trafodion predicate push down to hbase is supporting only the > following cases: > AND AND… > And require columns to be “SERIALIZED” (can be compared using binary > comparator), > and value data type is not a superset of column data type. > and char type is not case insensitive or upshifted > and no support for Big Numbers > It suffer from several issues: > - Handling of nullable column: > When a nullable column is involved in the predicate, because of the way nulls > are handled in trafodion (can ether be missing cell, or cell with first byte > set to xFF), binary compare cannot do a good job at semantically treating > NULL the way a SQL expression would require. So the current behavior is that > all null column values as never filtered out and always returned, letting > trafodion perform a second pass predicate evaluation to deal with nulls. This > can quickly turn counterproductive for very sparse columns, as we would > perform useless filtering at region server side (since all nulls are pass), > and optimizer has not been coded to turn off the feature on sparse columns. > In addition, since null handling is done on trafodion side, the current code > artificially pull all key columns to make sure that a null coded as absent > cell is correctly pushed up for evaluation at trafodion layer. This can be > optimized by only requiring a single non-nullable column on current code, but > this is another story… as you will see bellow, the proposed new way of doing > pushdown will handle 100% nulls at hbase layer, therefore requiring adding > non nullable column only when a nullable column is needed in the select > statement (not in the predicate). > - Always returning predicate columns > Select a from t where b>10 would always return the b column to trafodion, > even if b is non nullable. This is not necessary and will result in useless > network and cpu consumption, even if the predicate is not re-evaluated. > The new advanced predicate push down feature will do the following: > Support any of these primitives: > > (nice to have, high cost of custom filter low > value after TPC-DS query survey) > Is null > Is not null > Like -> to be investigated, not yet covered in this document > And combination of these primitive with arbitrary number of OR and AND with ( > ) associations, given that within () there is only ether any number of OR or > any number of AND, no mixing OR and AND inside (). I suspect that normalizer > will always convert expression so that this mixing never happen… > And will remove the 2 shortcoming of previous implementation: all null cases > will be handled at hbase layer, never requiring re-doing evaluation and the > associated pushing up of null columns, and predicate columns will not be > pushed up if not needed by the node for other task than the predicate > evaluation. > Note that BETWEEN and IN predicate, when normalized as one of the form > supported above, will be pushed down too. Nothing in the code will need to be > done to support this. > Improvement of explain: > We currently do not show predicate push down information in the scan node. 2 > key information is needed: > - Is predicate push down used > - What columns are retrieved by the scan node (investigate why we get > column all instead of accurate information) > The first one is obviously used to determine if all the conditions are met to > have push down available, and the second is used to make sure we are not > pushing up data from columns we don’t need. > Note that columns info is inconsistently shown today. Need to fix this. > Enablement using an existing ON/OFF CQD (HBASE_FILTER_PREDS) that will be > replaced with a multi value CQD that will enable various level of push down > optimization, like we have on PCODE optimization level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TRAFODION-1662) Predicate push down revisited (V2)
[ https://issues.apache.org/jira/browse/TRAFODION-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Owhadi updated TRAFODION-1662: --- Attachment: Performance results analyzing effects of optimizations introduced in pushdown V2.docx performance impact of predicate pushdown V2 > Predicate push down revisited (V2) > -- > > Key: TRAFODION-1662 > URL: https://issues.apache.org/jira/browse/TRAFODION-1662 > Project: Apache Trafodion > Issue Type: Improvement > Components: sql-exe >Affects Versions: 2.0-incubating >Reporter: Eric Owhadi >Assignee: Eric Owhadi > Labels: predicate, pushdown > Attachments: Advanced predicate push down feature.docx, Advanced > predicate push down feature.docx, Performance results analyzing effects of > optimizations introduced in pushdown V2.docx > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Currently Trafodion predicate push down to hbase is supporting only the > following cases: > AND AND… > And require columns to be “SERIALIZED” (can be compared using binary > comparator), > and value data type is not a superset of column data type. > and char type is not case insensitive or upshifted > and no support for Big Numbers > It suffer from several issues: > - Handling of nullable column: > When a nullable column is involved in the predicate, because of the way nulls > are handled in trafodion (can ether be missing cell, or cell with first byte > set to xFF), binary compare cannot do a good job at semantically treating > NULL the way a SQL expression would require. So the current behavior is that > all null column values as never filtered out and always returned, letting > trafodion perform a second pass predicate evaluation to deal with nulls. This > can quickly turn counterproductive for very sparse columns, as we would > perform useless filtering at region server side (since all nulls are pass), > and optimizer has not been coded to turn off the feature on sparse columns. > In addition, since null handling is done on trafodion side, the current code > artificially pull all key columns to make sure that a null coded as absent > cell is correctly pushed up for evaluation at trafodion layer. This can be > optimized by only requiring a single non-nullable column on current code, but > this is another story… as you will see bellow, the proposed new way of doing > pushdown will handle 100% nulls at hbase layer, therefore requiring adding > non nullable column only when a nullable column is needed in the select > statement (not in the predicate). > - Always returning predicate columns > Select a from t where b>10 would always return the b column to trafodion, > even if b is non nullable. This is not necessary and will result in useless > network and cpu consumption, even if the predicate is not re-evaluated. > The new advanced predicate push down feature will do the following: > Support any of these primitives: > > (nice to have, high cost of custom filter low > value after TPC-DS query survey) > Is null > Is not null > Like -> to be investigated, not yet covered in this document > And combination of these primitive with arbitrary number of OR and AND with ( > ) associations, given that within () there is only ether any number of OR or > any number of AND, no mixing OR and AND inside (). I suspect that normalizer > will always convert expression so that this mixing never happen… > And will remove the 2 shortcoming of previous implementation: all null cases > will be handled at hbase layer, never requiring re-doing evaluation and the > associated pushing up of null columns, and predicate columns will not be > pushed up if not needed by the node for other task than the predicate > evaluation. > Note that BETWEEN and IN predicate, when normalized as one of the form > supported above, will be pushed down too. Nothing in the code will need to be > done to support this. > Improvement of explain: > We currently do not show predicate push down information in the scan node. 2 > key information is needed: > - Is predicate push down used > - What columns are retrieved by the scan node (investigate why we get > column all instead of accurate information) > The first one is obviously used to determine if all the conditions are met to > have push down available, and the second is used to make sure we are not > pushing up data from columns we don’t need. > Note that columns info is inconsistently shown today. Need to fix this. > Enablement using an existing ON/OFF CQD (HBASE_FILTER_PREDS) that will be > replaced with a multi value CQD that will enable various level of push down > optimization, like we have on PCODE optimization level. -- This message was sent by Atlassian JIRA
[jira] [Commented] (TRAFODION-1771) TESTRTS fails
[ https://issues.apache.org/jira/browse/TRAFODION-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15111518#comment-15111518 ] Eric Owhadi commented on TRAFODION-1771: Steve Arnaud can provide access to a VM showing the problem 100%, with just 2 test to run (TEST005 followed by TESTRTS) > TESTRTS fails > - > > Key: TRAFODION-1771 > URL: https://issues.apache.org/jira/browse/TRAFODION-1771 > Project: Apache Trafodion > Issue Type: Bug > Components: Build Infrastructure >Affects Versions: 2.0-incubating >Reporter: Eric Owhadi > Attachments: corefiles.log > > > TESTRTS is failing with core dumped on PR255 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TRAFODION-1771) TESTRTS fails
[ https://issues.apache.org/jira/browse/TRAFODION-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Owhadi updated TRAFODION-1771: --- Attachment: corefiles.log Back trace file at time of failure > TESTRTS fails > - > > Key: TRAFODION-1771 > URL: https://issues.apache.org/jira/browse/TRAFODION-1771 > Project: Apache Trafodion > Issue Type: Bug > Components: Build Infrastructure >Affects Versions: 2.0-incubating >Reporter: Eric Owhadi > Attachments: corefiles.log > > > TESTRTS is failing with core dumped on PR255 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TRAFODION-1771) TESTRTS fails
[ https://issues.apache.org/jira/browse/TRAFODION-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15111512#comment-15111512 ] Eric Owhadi commented on TRAFODION-1771: I have the core dump if someone needs it > TESTRTS fails > - > > Key: TRAFODION-1771 > URL: https://issues.apache.org/jira/browse/TRAFODION-1771 > Project: Apache Trafodion > Issue Type: Bug > Components: Build Infrastructure >Affects Versions: 2.0-incubating >Reporter: Eric Owhadi > > TESTRTS is failing with core dumped on PR255 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TRAFODION-1771) TESTRTS fails
[ https://issues.apache.org/jira/browse/TRAFODION-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15111511#comment-15111511 ] Eric Owhadi commented on TRAFODION-1771: The last 2 days I have been hunting with the great help of Steve Arnaud, a weird failure blocking the merge of PR 255 (predicate pushdown V2). [TRAFODION-1662]. 13 days ago, PR 255 was passing Jenkins. 6 days ago, after applying changes related to code review on PR 255 and synching to latest master, Jenkins started failing the core/TESTRTS with a core dump. So I created a fake PR on a new branch, with the initial code from 13 days ago, and merged it with latest master -> Jenkins fails with same error. Demonstrating that the PR rework was not at root cause of this sudden wrong behavior. So the issue is a combination of my new code (initial or after rework) with some changes in master that happened between 13 days ago and 6 days ago. This failure does not happen on dev environment (tested both in debug and release mode). Steve was able to duplicate it on a jenkins server, and narrowed down the condition for its apparition to the sequence of core/TEST005 followed by core/TESTRTS Without TEST005 as catalyst, the issue does not manifest in the Jenkins server ether. The stack trace at time of explosion is not very helpful and shows (in red 8926 is the const value of EXE_STAT_NOT_FOUND): EXE_STAT_NOT_FOUND can come from 34 different code path, and unfortunately, the structure of the code does not help narrowing down witch one of the 34 was crossed at time of death with stack trace analysis. -> I hate Murphy… Thread 1 (Thread 0x7f26080e23c0 (LWP 1505)): #0 0x7f2605260625 in raise () from /lib64/libc.so.6 #1 0x7f2605261d8d in abort () from /lib64/libc.so.6 #2 0x7f2604d39494 in ComCondition::setSQLCODE (this=, newSQLCODE=-8926) at ../export/ComDiags.cpp:1428 #3 0x7f2603911c56 in ExHandleErrors (qparent=..., down_entry=, matchNo=, globals=, diags_in=, err=4294958370, intParam1=0x0, stringParam1=0x0, nskErr=0x0, stringParam2=0x0) at ../executor/ex_error.cpp:170 #4 0x7f2603a01e26 in ExExeUtilGetRTSStatisticsTcb::work (this=0x7f25f39eec58) at ../executor/ExExeUtilGetStats.cpp:4222 #5 0x7f2603a5ee33 in ExScheduler::work (this=0x7f25f39ee7c0, prevWaitTime=) at ../executor/ExScheduler.cpp:331 #6 0x7f2603973752 in ex_root_tcb::execute (this=0x7f25f39f4c50, cliGlobals=0x2b70120, glob=0x7f25f39b2ca8, input_desc=0x7f25f39aa030, diagsArea=@0x7ffd2e100750, reExecute=0) at ../executor/ex_root.cpp:1058 #7 0x7f2604fe7654 in CliStatement::execute (this=0x7f25f39c0ea0, cliGlobals=0x2b70120, input_desc=0x7f25f39aa030, diagsArea=, execute_state=, fixupOnly=0, cliflags=0) at ../cli/Statement.cpp:4525 #8 0x7f2604f892ac in SQLCLI_PerformTasks(CliGlobals *, ULng32, SQLSTMT_ID *, SQLDESC_ID *, SQLDESC_ID *, Lng32, Lng32, typedef __va_list_tag __va_list_tag *, SQLCLI_PTR_PAIRS *, SQLCLI_PTR_PAIRS *) (cliGlobals=0x2b70120, tasks=4882, statement_id=0x3398010, input_descriptor=0x34c0eb0, output_descriptor=0x0, num_input_ptr_pairs=0, num_output_ptr_pairs=0, ap=0x7ffd2e1008f0, input_ptr_pairs=0x0, output_ptr_pairs=0x0) at ../cli/Cli.cpp:3297 #9 0x7f2604f89fe2 in SQLCLI_Exec(CliGlobals *, SQLSTMT_ID *, SQLDESC_ID *, Lng32, typedef __va_list_tag __va_list_tag *, SQLCLI_PTR_PAIRS *) (cliGlobals=, statement_id=, input_descriptor=, num_ptr_pairs=, ap=, ptr_pairs=) at ../cli/Cli.cpp:3544 #10 0x7f2604ff588b in SQL_EXEC_Exec (statement_id=0x3398010, input_descriptor=0x34c0eb0, num_ptr_pairs=0) at ../cli/CliExtern.cpp:2074 #11 0x7f26078bb99b in SqlCmd::doExec (sqlci_env=0x2b58c50, stmt=0x3398010, prep_stmt=, numUnnamedParams=, unnamedParamArray=, unnamedParamCharSetArray=, handleError=1) at ../sqlci/SqlCmd.cpp:1786 #12 0x7f26078bc392 in SqlCmd::do_execute (sqlci_env=0x2b58c50, prep_stmt=0x2cefd50, numUnnamedParams=0, unnamedParamArray=0x0, unnamedParamCharSetArray=0x0, prepcode=0) at ../sqlci/SqlCmd.cpp:2122 #13 0x7f26078bcabd in DML::process (this=0x2cf0080, sqlci_env=0x2b58c50) at ../sqlci/SqlCmd.cpp:2897 #14 0x7f26078a2844 in Obey::process (this=0x2ceff60, sqlci_env=) at ../sqlci/Obey.cpp:267 #15 0x7f26078a2844 in Obey::process (this=0x385d960, sqlci_env=) at ../sqlci/Obey.cpp:267 #16 0x7f26078a2844 in Obey::process (this=0x4662980, sqlci_env=) at ../sqlci/Obey.cpp:267 #17 0x7f26078ab074 in SqlciEnv::run (this=0x2b58c50, in_filename=, input_string=) at ../sqlci/SqlciEnv.cpp:729 #18 0x004019d2 in main (argc=2, argv=0x7ffd2e102578) at ../bin/SqlciMain.cpp:329 > TESTRTS fails > - > > Key: TRAFODION-1771 > URL: https://issues.apache.org/jira/browse/TRAFODION-1771 > Project: Apache Trafodion > Issue Type: Bug > Components: Build Infrastructure >Affects Ver
[jira] [Created] (TRAFODION-1771) TESTRTS fails
Eric Owhadi created TRAFODION-1771: -- Summary: TESTRTS fails Key: TRAFODION-1771 URL: https://issues.apache.org/jira/browse/TRAFODION-1771 Project: Apache Trafodion Issue Type: Bug Components: Build Infrastructure Affects Versions: 2.0-incubating Reporter: Eric Owhadi TESTRTS is failing with core dumped on PR255 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TRAFODION-1662) Predicate push down revisited (V2)
[ https://issues.apache.org/jira/browse/TRAFODION-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Owhadi updated TRAFODION-1662: --- Attachment: Advanced predicate push down feature.docx latest version, after code completed > Predicate push down revisited (V2) > -- > > Key: TRAFODION-1662 > URL: https://issues.apache.org/jira/browse/TRAFODION-1662 > Project: Apache Trafodion > Issue Type: Improvement > Components: sql-exe >Affects Versions: 2.0-incubating >Reporter: Eric Owhadi >Assignee: Eric Owhadi > Labels: predicate, pushdown > Attachments: Advanced predicate push down feature.docx, Advanced > predicate push down feature.docx > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Currently Trafodion predicate push down to hbase is supporting only the > following cases: > AND AND… > And require columns to be “SERIALIZED” (can be compared using binary > comparator), > and value data type is not a superset of column data type. > and char type is not case insensitive or upshifted > and no support for Big Numbers > It suffer from several issues: > - Handling of nullable column: > When a nullable column is involved in the predicate, because of the way nulls > are handled in trafodion (can ether be missing cell, or cell with first byte > set to xFF), binary compare cannot do a good job at semantically treating > NULL the way a SQL expression would require. So the current behavior is that > all null column values as never filtered out and always returned, letting > trafodion perform a second pass predicate evaluation to deal with nulls. This > can quickly turn counterproductive for very sparse columns, as we would > perform useless filtering at region server side (since all nulls are pass), > and optimizer has not been coded to turn off the feature on sparse columns. > In addition, since null handling is done on trafodion side, the current code > artificially pull all key columns to make sure that a null coded as absent > cell is correctly pushed up for evaluation at trafodion layer. This can be > optimized by only requiring a single non-nullable column on current code, but > this is another story… as you will see bellow, the proposed new way of doing > pushdown will handle 100% nulls at hbase layer, therefore requiring adding > non nullable column only when a nullable column is needed in the select > statement (not in the predicate). > - Always returning predicate columns > Select a from t where b>10 would always return the b column to trafodion, > even if b is non nullable. This is not necessary and will result in useless > network and cpu consumption, even if the predicate is not re-evaluated. > The new advanced predicate push down feature will do the following: > Support any of these primitives: > > (nice to have, high cost of custom filter low > value after TPC-DS query survey) > Is null > Is not null > Like -> to be investigated, not yet covered in this document > And combination of these primitive with arbitrary number of OR and AND with ( > ) associations, given that within () there is only ether any number of OR or > any number of AND, no mixing OR and AND inside (). I suspect that normalizer > will always convert expression so that this mixing never happen… > And will remove the 2 shortcoming of previous implementation: all null cases > will be handled at hbase layer, never requiring re-doing evaluation and the > associated pushing up of null columns, and predicate columns will not be > pushed up if not needed by the node for other task than the predicate > evaluation. > Note that BETWEEN and IN predicate, when normalized as one of the form > supported above, will be pushed down too. Nothing in the code will need to be > done to support this. > Improvement of explain: > We currently do not show predicate push down information in the scan node. 2 > key information is needed: > - Is predicate push down used > - What columns are retrieved by the scan node (investigate why we get > column all instead of accurate information) > The first one is obviously used to determine if all the conditions are met to > have push down available, and the second is used to make sure we are not > pushing up data from columns we don’t need. > Note that columns info is inconsistently shown today. Need to fix this. > Enablement using an existing ON/OFF CQD (HBASE_FILTER_PREDS) that will be > replaced with a multi value CQD that will enable various level of push down > optimization, like we have on PCODE optimization level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TRAFODION-1671) hive regression TEST009 FAILS
Eric Owhadi created TRAFODION-1671: -- Summary: hive regression TEST009 FAILS Key: TRAFODION-1671 URL: https://issues.apache.org/jira/browse/TRAFODION-1671 Project: Apache Trafodion Issue Type: Bug Components: Build Infrastructure Affects Versions: 1.3-incubating Reporter: Eric Owhadi Priority: Minor regression test hive TEST009 fails (30 lines): diff files: 28c28,34 < --- 0 row(s) selected. --- > SCHEMA_NAME > -- > > _HBASESTATS_ > _HB__CELL__ > > --- 2 row(s) selected. 446,451c452,457 < 2 HIVE HIVE ITEM < 3 TRAFODION _HV_HIVE_ PROMOTION < 4 HIVE HIVE PROMOTION < 5 TRAFODION _HV_SCH_T009_ T009T2 < 6 HIVE SCH_T009 T009T2 < 7 HIVE HIVE CUSTOMER --- > 2 TRAFODION _HV_HIVE_ PROMOTION > 3 HIVE HIVE PROMOTION > 4 TRAFODION _HV_SCH_T009_ T009T2 > 5 HIVE SCH_T009 T009T2 > 6 HIVE HIVE CUSTOMER > 7 HIVE HIVE ITEM 507a514 > _HBASESTATS_ 511c518 < --- 2 row(s) selected. --- > --- 3 row(s) selected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TRAFODION-1662) Predicate push down revisited (V2)
[ https://issues.apache.org/jira/browse/TRAFODION-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034713#comment-15034713 ] Eric Owhadi commented on TRAFODION-1662: Thanks Suresh, 1/ me too :-) 2/ I am hoping that rms can just consider using the cardinality of the table as provided by stats, assuming we have done a full scan for row accessed? 3/ I propose we open a JIRA on dynamic RPC time out for both this one and on coprocessor count. But I think in future version of HBase, there is some form of heartbeat that should avoid the current RPC timeout issue. To be investigated. 4/ good point, thanks for mentioning 5/ indeed, I'll keep this in mind. 6/ I'll have a look, thanks for mentioning > Predicate push down revisited (V2) > -- > > Key: TRAFODION-1662 > URL: https://issues.apache.org/jira/browse/TRAFODION-1662 > Project: Apache Trafodion > Issue Type: Improvement > Components: sql-exe >Affects Versions: 2.0-incubating >Reporter: Eric Owhadi >Assignee: Eric Owhadi > Labels: predicate, pushdown > Attachments: Advanced predicate push down feature.docx > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Currently Trafodion predicate push down to hbase is supporting only the > following cases: > AND AND… > And require columns to be “SERIALIZED” (can be compared using binary > comparator), > and value data type is not a superset of column data type. > and char type is not case insensitive or upshifted > and no support for Big Numbers > It suffer from several issues: > - Handling of nullable column: > When a nullable column is involved in the predicate, because of the way nulls > are handled in trafodion (can ether be missing cell, or cell with first byte > set to xFF), binary compare cannot do a good job at semantically treating > NULL the way a SQL expression would require. So the current behavior is that > all null column values as never filtered out and always returned, letting > trafodion perform a second pass predicate evaluation to deal with nulls. This > can quickly turn counterproductive for very sparse columns, as we would > perform useless filtering at region server side (since all nulls are pass), > and optimizer has not been coded to turn off the feature on sparse columns. > In addition, since null handling is done on trafodion side, the current code > artificially pull all key columns to make sure that a null coded as absent > cell is correctly pushed up for evaluation at trafodion layer. This can be > optimized by only requiring a single non-nullable column on current code, but > this is another story… as you will see bellow, the proposed new way of doing > pushdown will handle 100% nulls at hbase layer, therefore requiring adding > non nullable column only when a nullable column is needed in the select > statement (not in the predicate). > - Always returning predicate columns > Select a from t where b>10 would always return the b column to trafodion, > even if b is non nullable. This is not necessary and will result in useless > network and cpu consumption, even if the predicate is not re-evaluated. > The new advanced predicate push down feature will do the following: > Support any of these primitives: > > (nice to have, high cost of custom filter low > value after TPC-DS query survey) > Is null > Is not null > Like -> to be investigated, not yet covered in this document > And combination of these primitive with arbitrary number of OR and AND with ( > ) associations, given that within () there is only ether any number of OR or > any number of AND, no mixing OR and AND inside (). I suspect that normalizer > will always convert expression so that this mixing never happen… > And will remove the 2 shortcoming of previous implementation: all null cases > will be handled at hbase layer, never requiring re-doing evaluation and the > associated pushing up of null columns, and predicate columns will not be > pushed up if not needed by the node for other task than the predicate > evaluation. > Note that BETWEEN and IN predicate, when normalized as one of the form > supported above, will be pushed down too. Nothing in the code will need to be > done to support this. > Improvement of explain: > We currently do not show predicate push down information in the scan node. 2 > key information is needed: > - Is predicate push down used > - What columns are retrieved by the scan node (investigate why we get > column all instead of accurate information) > The first one is obviously used to determine if all the conditions are met to > have push down available, and the second is used to make sure we are not > pushing up data from columns we don’t need. > Note that columns inf
[jira] [Updated] (TRAFODION-1662) Predicate push down revisited (V2)
[ https://issues.apache.org/jira/browse/TRAFODION-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Owhadi updated TRAFODION-1662: --- Attachment: Advanced predicate push down feature.docx blueprint > Predicate push down revisited (V2) > -- > > Key: TRAFODION-1662 > URL: https://issues.apache.org/jira/browse/TRAFODION-1662 > Project: Apache Trafodion > Issue Type: Improvement > Components: sql-exe >Affects Versions: 2.0-incubating >Reporter: Eric Owhadi >Assignee: Eric Owhadi > Labels: predicate, pushdown > Attachments: Advanced predicate push down feature.docx > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Currently Trafodion predicate push down to hbase is supporting only the > following cases: > AND AND… > And require columns to be “SERIALIZED” (can be compared using binary > comparator), > and value data type is not a superset of column data type. > and char type is not case insensitive or upshifted > and no support for Big Numbers > It suffer from several issues: > - Handling of nullable column: > When a nullable column is involved in the predicate, because of the way nulls > are handled in trafodion (can ether be missing cell, or cell with first byte > set to xFF), binary compare cannot do a good job at semantically treating > NULL the way a SQL expression would require. So the current behavior is that > all null column values as never filtered out and always returned, letting > trafodion perform a second pass predicate evaluation to deal with nulls. This > can quickly turn counterproductive for very sparse columns, as we would > perform useless filtering at region server side (since all nulls are pass), > and optimizer has not been coded to turn off the feature on sparse columns. > In addition, since null handling is done on trafodion side, the current code > artificially pull all key columns to make sure that a null coded as absent > cell is correctly pushed up for evaluation at trafodion layer. This can be > optimized by only requiring a single non-nullable column on current code, but > this is another story… as you will see bellow, the proposed new way of doing > pushdown will handle 100% nulls at hbase layer, therefore requiring adding > non nullable column only when a nullable column is needed in the select > statement (not in the predicate). > - Always returning predicate columns > Select a from t where b>10 would always return the b column to trafodion, > even if b is non nullable. This is not necessary and will result in useless > network and cpu consumption, even if the predicate is not re-evaluated. > The new advanced predicate push down feature will do the following: > Support any of these primitives: > > (nice to have, high cost of custom filter low > value after TPC-DS query survey) > Is null > Is not null > Like -> to be investigated, not yet covered in this document > And combination of these primitive with arbitrary number of OR and AND with ( > ) associations, given that within () there is only ether any number of OR or > any number of AND, no mixing OR and AND inside (). I suspect that normalizer > will always convert expression so that this mixing never happen… > And will remove the 2 shortcoming of previous implementation: all null cases > will be handled at hbase layer, never requiring re-doing evaluation and the > associated pushing up of null columns, and predicate columns will not be > pushed up if not needed by the node for other task than the predicate > evaluation. > Note that BETWEEN and IN predicate, when normalized as one of the form > supported above, will be pushed down too. Nothing in the code will need to be > done to support this. > Improvement of explain: > We currently do not show predicate push down information in the scan node. 2 > key information is needed: > - Is predicate push down used > - What columns are retrieved by the scan node (investigate why we get > column all instead of accurate information) > The first one is obviously used to determine if all the conditions are met to > have push down available, and the second is used to make sure we are not > pushing up data from columns we don’t need. > Note that columns info is inconsistently shown today. Need to fix this. > Enablement using an existing ON/OFF CQD (HBASE_FILTER_PREDS) that will be > replaced with a multi value CQD that will enable various level of push down > optimization, like we have on PCODE optimization level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (TRAFODION-1662) Predicate push down revisited (V2)
[ https://issues.apache.org/jira/browse/TRAFODION-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on TRAFODION-1662 started by Eric Owhadi. -- > Predicate push down revisited (V2) > -- > > Key: TRAFODION-1662 > URL: https://issues.apache.org/jira/browse/TRAFODION-1662 > Project: Apache Trafodion > Issue Type: Improvement > Components: sql-exe >Affects Versions: 2.0-incubating >Reporter: Eric Owhadi >Assignee: Eric Owhadi > Labels: predicate, pushdown > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Currently Trafodion predicate push down to hbase is supporting only the > following cases: > AND AND… > And require columns to be “SERIALIZED” (can be compared using binary > comparator), > and value data type is not a superset of column data type. > and char type is not case insensitive or upshifted > and no support for Big Numbers > It suffer from several issues: > - Handling of nullable column: > When a nullable column is involved in the predicate, because of the way nulls > are handled in trafodion (can ether be missing cell, or cell with first byte > set to xFF), binary compare cannot do a good job at semantically treating > NULL the way a SQL expression would require. So the current behavior is that > all null column values as never filtered out and always returned, letting > trafodion perform a second pass predicate evaluation to deal with nulls. This > can quickly turn counterproductive for very sparse columns, as we would > perform useless filtering at region server side (since all nulls are pass), > and optimizer has not been coded to turn off the feature on sparse columns. > In addition, since null handling is done on trafodion side, the current code > artificially pull all key columns to make sure that a null coded as absent > cell is correctly pushed up for evaluation at trafodion layer. This can be > optimized by only requiring a single non-nullable column on current code, but > this is another story… as you will see bellow, the proposed new way of doing > pushdown will handle 100% nulls at hbase layer, therefore requiring adding > non nullable column only when a nullable column is needed in the select > statement (not in the predicate). > - Always returning predicate columns > Select a from t where b>10 would always return the b column to trafodion, > even if b is non nullable. This is not necessary and will result in useless > network and cpu consumption, even if the predicate is not re-evaluated. > The new advanced predicate push down feature will do the following: > Support any of these primitives: > > (nice to have, high cost of custom filter low > value after TPC-DS query survey) > Is null > Is not null > Like -> to be investigated, not yet covered in this document > And combination of these primitive with arbitrary number of OR and AND with ( > ) associations, given that within () there is only ether any number of OR or > any number of AND, no mixing OR and AND inside (). I suspect that normalizer > will always convert expression so that this mixing never happen… > And will remove the 2 shortcoming of previous implementation: all null cases > will be handled at hbase layer, never requiring re-doing evaluation and the > associated pushing up of null columns, and predicate columns will not be > pushed up if not needed by the node for other task than the predicate > evaluation. > Note that BETWEEN and IN predicate, when normalized as one of the form > supported above, will be pushed down too. Nothing in the code will need to be > done to support this. > Improvement of explain: > We currently do not show predicate push down information in the scan node. 2 > key information is needed: > - Is predicate push down used > - What columns are retrieved by the scan node (investigate why we get > column all instead of accurate information) > The first one is obviously used to determine if all the conditions are met to > have push down available, and the second is used to make sure we are not > pushing up data from columns we don’t need. > Note that columns info is inconsistently shown today. Need to fix this. > Enablement using an existing ON/OFF CQD (HBASE_FILTER_PREDS) that will be > replaced with a multi value CQD that will enable various level of push down > optimization, like we have on PCODE optimization level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TRAFODION-1662) Predicate push down revisited (V2)
Eric Owhadi created TRAFODION-1662: -- Summary: Predicate push down revisited (V2) Key: TRAFODION-1662 URL: https://issues.apache.org/jira/browse/TRAFODION-1662 Project: Apache Trafodion Issue Type: Improvement Components: sql-exe Affects Versions: 2.0-incubating Reporter: Eric Owhadi Currently Trafodion predicate push down to hbase is supporting only the following cases: AND AND… And require columns to be “SERIALIZED” (can be compared using binary comparator), and value data type is not a superset of column data type. and char type is not case insensitive or upshifted and no support for Big Numbers It suffer from several issues: - Handling of nullable column: When a nullable column is involved in the predicate, because of the way nulls are handled in trafodion (can ether be missing cell, or cell with first byte set to xFF), binary compare cannot do a good job at semantically treating NULL the way a SQL expression would require. So the current behavior is that all null column values as never filtered out and always returned, letting trafodion perform a second pass predicate evaluation to deal with nulls. This can quickly turn counterproductive for very sparse columns, as we would perform useless filtering at region server side (since all nulls are pass), and optimizer has not been coded to turn off the feature on sparse columns. In addition, since null handling is done on trafodion side, the current code artificially pull all key columns to make sure that a null coded as absent cell is correctly pushed up for evaluation at trafodion layer. This can be optimized by only requiring a single non-nullable column on current code, but this is another story… as you will see bellow, the proposed new way of doing pushdown will handle 100% nulls at hbase layer, therefore requiring adding non nullable column only when a nullable column is needed in the select statement (not in the predicate). - Always returning predicate columns Select a from t where b>10 would always return the b column to trafodion, even if b is non nullable. This is not necessary and will result in useless network and cpu consumption, even if the predicate is not re-evaluated. The new advanced predicate push down feature will do the following: Support any of these primitives: (nice to have, high cost of custom filter low value after TPC-DS query survey) Is null Is not null Like-> to be investigated, not yet covered in this document And combination of these primitive with arbitrary number of OR and AND with ( ) associations, given that within () there is only ether any number of OR or any number of AND, no mixing OR and AND inside (). I suspect that normalizer will always convert expression so that this mixing never happen… And will remove the 2 shortcoming of previous implementation: all null cases will be handled at hbase layer, never requiring re-doing evaluation and the associated pushing up of null columns, and predicate columns will not be pushed up if not needed by the node for other task than the predicate evaluation. Note that BETWEEN and IN predicate, when normalized as one of the form supported above, will be pushed down too. Nothing in the code will need to be done to support this. Improvement of explain: We currently do not show predicate push down information in the scan node. 2 key information is needed: - Is predicate push down used - What columns are retrieved by the scan node (investigate why we get column all instead of accurate information) The first one is obviously used to determine if all the conditions are met to have push down available, and the second is used to make sure we are not pushing up data from columns we don’t need. Note that columns info is inconsistently shown today. Need to fix this. Enablement using an existing ON/OFF CQD (HBASE_FILTER_PREDS) that will be replaced with a multi value CQD that will enable various level of push down optimization, like we have on PCODE optimization level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TRAFODION-1487) Hedge read to boost read performances ince HBase 1.0
Eric Owhadi created TRAFODION-1487: -- Summary: Hedge read to boost read performances ince HBase 1.0 Key: TRAFODION-1487 URL: https://issues.apache.org/jira/browse/TRAFODION-1487 Project: Apache Trafodion Issue Type: Sub-task Reporter: Eric Owhadi see here how to configure it: http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/admin_hedged_reads.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TRAFODION-1486) MultiWal for write heavy workload is promissing. available since HBase 1.0
Eric Owhadi created TRAFODION-1486: -- Summary: MultiWal for write heavy workload is promissing. available since HBase 1.0 Key: TRAFODION-1486 URL: https://issues.apache.org/jira/browse/TRAFODION-1486 Project: Apache Trafodion Issue Type: Sub-task Reporter: Eric Owhadi https://issues.apache.org/jira/browse/HBASE-5699 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TRAFODION-1485) Patch for hbase.client.scanner.timeout.period logic until fix in hbase is available
[ https://issues.apache.org/jira/browse/TRAFODION-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14732367#comment-14732367 ] Eric Owhadi commented on TRAFODION-1485: Pasting the discussion on the subject on hbase dev list: You can take a look at HBASE-1: Renew Scanner Lease without advancing the RegionScanner, which may be helpful in this kind of case Your proposal sounds like a good alternative approach as well. We should add that JIRA to the blog link Stack mentioned. Jerry On Sat, Sep 5, 2015 at 9:07 AM, Stack wrote: > On Fri, Sep 4, 2015 at 5:06 PM, Eric Owhadi wrote: > > > OK so to answer the "is it easy to insert the patched scanner for > > trafodion", the answer is no. > > > > I suspected this. > > > > > Was easier on .98, but on 1.0 it was quite a challenge. All about > > dealing with private attributes instead of protected that are not > > visible to the PatchClentScanner class that extends ClientScanner. > > Currently running the regression tests to see if there is no side > effect... > > Was able to demonstrate with breakpoint on next() waiting more than > > 1 mn (the default lease timeout value) that with the patch things > > gracefully reset and all is good, no row skipped or duplicated, > > while without, I get the Scanner time out exception. Patch can be > > turn on or off with a new > key > > in hbase-site.xml... > > I will feel better when this will be deprecated :-). > > > > Smile. > > Excellent. You have a patch for us then Eric? Sounds like the > interjection of your new Scanner would be for pre-2.0. For 2.0 we > should just turn on this behavior as the default. > > Thanks, > St.Ack > > > > > Eric Owhadi > > > > -Original Message- > > From: saint@gmail.com [mailto:saint@gmail.com] On Behalf Of > Stack > > Sent: Friday, August 28, 2015 6:35 PM > > To: HBase Dev List > > Subject: Re: Question on hbase.client.scanner.timeout.period > > > > On Fri, Aug 28, 2015 at 11:31 AM, Eric Owhadi > > > > wrote: > > > > > That sounds good, but given trafodion needs to work on current and > > > future released version of HBase, unpatched, I will first > > > implement a ClientScannerTrafodion (to be deprecated), inheriting > > > from ClientScanner that will just overload the loadCache(),and > > > make sure that the code that is picking up the right scanner based > > > on scan object is bypassed to force getting the > > > ClientScannerTrafodion when appropriate. > > > Not very elegant, but need to take into consideration trafodion > > > deployment requirements. > > > Then, if we do not discover any side effect during our QA related > > > to this code I will port the fix on HBase to deprecate the custom > > > scanner (probably first on HBase 2.0, then will let the community > > > decide if this fix is worth it for back porting...). It will be a > > > first for me, but that's great, I'll take your offer to help ;-)... > > > > > > > Sweet. Suggest opening an umbrellas issue in hbase to implement this > > feature. Reference HBASE-2161 (it is closed now). Link trafodion > > issue to it. A subtask could have implementation in hbase 2.0, > > another could be backport. > > > > Is is easy to insert your T*ClientScanner? > > St.Ack > > > > > > > > > Regards, > > > Eric > > > > > > -Original Message- > > > From: saint@gmail.com [mailto:saint@gmail.com] On Behalf > > > Of Stack > > > Sent: Thursday, August 27, 2015 3:55 PM > > > To: HBase Dev List > > > Subject: Re: Question on hbase.client.scanner.timeout.period > > > > > > On Thu, Aug 27, 2015 at 1:39 PM, Eric Owhadi > > > > > > wrote: > > > > > > > Oops, my bad, the related JIRA was : > > > > https://issues.apache.org/jira/browse/HBASE-2161 > > > > > > > > I am suggesting that the special code client side in loadCache() > > > > of ClientScanner that is trapping the UnknownScannerException, > > > > then on purpose check if it is coming from a lease timeout (and > > > > not by a region move) to decide that it would throw a > > > > ScannerTimeoutException instead of letting the code go and just > > > > reset the scanner and start from last successful retrieve (the > > > > way it works for an unknowScannerException due to a region moving). > > > > By just removing the special handling that tries to > > > > differentiate from unkownScannerException due to lease timeout, > > > > we should have a resolution to JIRA 2161- And to our trafodion issue. > > > > > > > > We are still protecting against dead client that would cause > > > > resource leak at region server, since we keep the lease timeout > > > > mechanism. > > > > > > > > Not sure if I have overlooked something, as usually, code is > > > > here for a reason :-)... > > > > > > > > > > > Your proposal sounds good to me. > > > > > > Scanner works the way it does because it has always work this way > > (smile). > > > A while back, one of t
[jira] [Work started] (TRAFODION-1485) Patch for hbase.client.scanner.timeout.period logic until fix in hbase is available
[ https://issues.apache.org/jira/browse/TRAFODION-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on TRAFODION-1485 started by Eric Owhadi. -- > Patch for hbase.client.scanner.timeout.period logic until fix in hbase is > available > --- > > Key: TRAFODION-1485 > URL: https://issues.apache.org/jira/browse/TRAFODION-1485 > Project: Apache Trafodion > Issue Type: Bug > Components: sql-exe >Affects Versions: 1.1 (pre-incubation) >Reporter: Eric Owhadi >Assignee: Eric Owhadi > Labels: patch > > We have been facing a situation on trafodion, where we are hitting the > hbase.client.scanner.timeout.period scenario: > basically, when doing queries that require spilling to disk because of high > complexity of what is involved, the underlying hbase scanner serving one of > the operation involved in the complex query cannot call the next() withing > the timeout specify... too busy taking care of other business. > This is legit scenario, and I was wondering why in the code, special care is > done to make sure that client side, if a DNRIOE of type > unknownScannerException shows up, and the hbase.client.scanner.timeout.period > time elapsed, we make sure to throw a scannerTimeoutException, instead of > just let it go and reset scanner. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TRAFODION-1485) Patch for hbase.client.scanner.timeout.period logic until fix in hbase is available
Eric Owhadi created TRAFODION-1485: -- Summary: Patch for hbase.client.scanner.timeout.period logic until fix in hbase is available Key: TRAFODION-1485 URL: https://issues.apache.org/jira/browse/TRAFODION-1485 Project: Apache Trafodion Issue Type: Bug Components: sql-exe Affects Versions: 1.1 (pre-incubation) Reporter: Eric Owhadi We have been facing a situation on trafodion, where we are hitting the hbase.client.scanner.timeout.period scenario: basically, when doing queries that require spilling to disk because of high complexity of what is involved, the underlying hbase scanner serving one of the operation involved in the complex query cannot call the next() withing the timeout specify... too busy taking care of other business. This is legit scenario, and I was wondering why in the code, special care is done to make sure that client side, if a DNRIOE of type unknownScannerException shows up, and the hbase.client.scanner.timeout.period time elapsed, we make sure to throw a scannerTimeoutException, instead of just let it go and reset scanner. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TRAFODION-1482) disabling BlockCache for all unbounded scan is not correct for dictionary tables
Eric Owhadi created TRAFODION-1482: -- Summary: disabling BlockCache for all unbounded scan is not correct for dictionary tables Key: TRAFODION-1482 URL: https://issues.apache.org/jira/browse/TRAFODION-1482 Project: Apache Trafodion Issue Type: Bug Components: sql-cmp, sql-exe Affects Versions: 1.1 (pre-incubation) Reporter: Eric Owhadi There is a workaround that was implemented to avoid cacheBlock trashing triggered by full table scan. It is in HTableClient.java, line looking like: //Disable block cache for full table scan If (startRow == null && stopRow == null) Scan.setCacheBlocks(false); This line bypass the cacheBlocks parameter passed to the startScan, hence is a workaround. However, this is a potentially negative workaround from some other performance angle on situations like “dictionary tables” on normalized schema. For example, if you have a table storing status code, error code, country etc , and linked to with foreign key, these tables are small and I would imagine they will most likely be fetched and spread on esps for hash joins with startRow and stopRow null. They won’t be cached with the workaround, but should be. Cache trashing is a problem only when scanning large tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TRAFODION-1446) End Key missing for simple scan scenario
[ https://issues.apache.org/jira/browse/TRAFODION-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Owhadi updated TRAFODION-1446: --- Priority: Blocker (was: Critical) > End Key missing for simple scan scenario > > > Key: TRAFODION-1446 > URL: https://issues.apache.org/jira/browse/TRAFODION-1446 > Project: Apache Trafodion > Issue Type: Bug > Components: sql-exe >Affects Versions: 2.0-incubating > Environment: see on developer build >Reporter: Eric Owhadi >Priority: Blocker > > Using a table a created with something like: > Create table t131helper (a int not null, primary key(a)); > Insert into t131helper values(1); > Create table t1311oneblock > (uniq int not null, > C100 int, > Str1 varchar(4000), > Primary key (uniq)) > Insert into t131oneblock > Select (100*x100)+(10*x10)+x1, > (100*x100)+(10*x10)+x1, > ‘xxx’ > From t131helper > Transpose 0,1,2,3,4,5,6,7,8,9 as x100 > Transpose 0,1,2,3,4,5,6,7,8,9 as x10 > Transpose 0,1,2,3,4,5,6,7,8,9 as x1 > so basically creating a table with key 0,1,2,3,4 etc until 999. > Then doing a > Select * from t131oneblock where uniq >2 and uniq < 5 > you will see that end key is not populated on the scan operator (use explain > to notice the empty end key). I stepped in the code, and the error is not > just a wrong display on the explain, the end key is not populated down to the > java hbase scan invoke. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TRAFODION-1446) End Key missing for simple scan scenario
Eric Owhadi created TRAFODION-1446: -- Summary: End Key missing for simple scan scenario Key: TRAFODION-1446 URL: https://issues.apache.org/jira/browse/TRAFODION-1446 Project: Apache Trafodion Issue Type: Bug Components: sql-exe Affects Versions: 2.0-incubating Environment: see on developer build Reporter: Eric Owhadi Priority: Critical Using a table a created with something like: Create table t131helper (a int not null, primary key(a)); Insert into t131helper values(1); Create table t1311oneblock (uniq int not null, C100 int, Str1 varchar(4000), Primary key (uniq)) Insert into t131oneblock Select (100*x100)+(10*x10)+x1, (100*x100)+(10*x10)+x1, ‘xxx’ >From t131helper Transpose 0,1,2,3,4,5,6,7,8,9 as x100 Transpose 0,1,2,3,4,5,6,7,8,9 as x10 Transpose 0,1,2,3,4,5,6,7,8,9 as x1 so basically creating a table with key 0,1,2,3,4 etc until 999. Then doing a Select * from t131oneblock where uniq >2 and uniq < 5 you will see that end key is not populated on the scan operator (use explain to notice the empty end key). I stepped in the code, and the error is not just a wrong display on the explain, the end key is not populated down to the java hbase scan invoke. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (TRAFODION-1420) Use ClientSmallScanner for small scans to improve performance
[ https://issues.apache.org/jira/browse/TRAFODION-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on TRAFODION-1420 started by Eric Owhadi. -- > Use ClientSmallScanner for small scans to improve performance > - > > Key: TRAFODION-1420 > URL: https://issues.apache.org/jira/browse/TRAFODION-1420 > Project: Apache Trafodion > Issue Type: Improvement > Components: sql-cmp, sql-exe >Reporter: Eric Owhadi >Assignee: Eric Owhadi > Labels: performance > Fix For: 2.0-incubating > > > Hbase implements an optimization for small scan (defined as scanning less > than a data block ie 64Kb) resulting in 3X performance improvement. The > underlying trick is about cutting down RPC calls from 3 (OpenScan/Next/Close) > to 1, and use pread stateless instead of seek/read state-full and locking > method to read data. This JIRA is about improving the compiler who can be > aware if a scan will be acting on single data block (small) or not, and pass > this data to executor so that it can use the right parameter for scan. > (scan.setSmall(boolean)). > reference: > https://issues.apache.org/jira/browse/HBASE-9488 > https://issues.apache.org/jira/browse/HBASE-7266 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TRAFODION-1420) Use ClientSmallScanner for small scans to improve performance
[ https://issues.apache.org/jira/browse/TRAFODION-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646414#comment-14646414 ] Eric Owhadi commented on TRAFODION-1420: after code analysis, turns out that in order for small scanner improvement to be significant, we also need to implement 2 different "transactional small scanner". > Use ClientSmallScanner for small scans to improve performance > - > > Key: TRAFODION-1420 > URL: https://issues.apache.org/jira/browse/TRAFODION-1420 > Project: Apache Trafodion > Issue Type: Improvement > Components: sql-cmp, sql-exe >Reporter: Eric Owhadi > Labels: performance > Fix For: 2.0-incubating > > > Hbase implements an optimization for small scan (defined as scanning less > than a data block ie 64Kb) resulting in 3X performance improvement. The > underlying trick is about cutting down RPC calls from 3 (OpenScan/Next/Close) > to 1, and use pread stateless instead of seek/read state-full and locking > method to read data. This JIRA is about improving the compiler who can be > aware if a scan will be acting on single data block (small) or not, and pass > this data to executor so that it can use the right parameter for scan. > (scan.setSmall(boolean)). > reference: > https://issues.apache.org/jira/browse/HBASE-9488 > https://issues.apache.org/jira/browse/HBASE-7266 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TRAFODION-1430) existing feature in hbase .98 allowing to perform parallel seeks in store scanner
Eric Owhadi created TRAFODION-1430: -- Summary: existing feature in hbase .98 allowing to perform parallel seeks in store scanner Key: TRAFODION-1430 URL: https://issues.apache.org/jira/browse/TRAFODION-1430 Project: Apache Trafodion Issue Type: Sub-task Reporter: Eric Owhadi Priority: Minor https://issues.apache.org/jira/browse/HBASE-7495 parallel seek in StoreScanner not sure if we ever tried this knob: hbase.storescanner.parallel.seek.enable=true ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TRAFODION-1429) enable pread for all scanner, and dedicated scanner with seek/read for compaction
Eric Owhadi created TRAFODION-1429: -- Summary: enable pread for all scanner, and dedicated scanner with seek/read for compaction Key: TRAFODION-1429 URL: https://issues.apache.org/jira/browse/TRAFODION-1429 Project: Apache Trafodion Issue Type: Sub-task Reporter: Eric Owhadi Optionally enable p-reads and private readers for compactions: https://issues.apache.org/jira/browse/HBASE-12411 this should be significant for OLTP workload. wait till trafodion support HBase 1.0 then play with configuration: hbase.storescanner.use.pread hbase.regionserver.compaction.private.readers -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TRAFODION-1428) bottleneck discovered on call Configuration for boolean check on Splice Machine testing
Eric Owhadi created TRAFODION-1428: -- Summary: bottleneck discovered on call Configuration for boolean check on Splice Machine testing Key: TRAFODION-1428 URL: https://issues.apache.org/jira/browse/TRAFODION-1428 Project: Apache Trafodion Issue Type: Sub-task Reporter: Eric Owhadi Priority: Minor StoreScanner calls Configuration for Boolean Check on each initialization https://issues.apache.org/jira/browse/HBASE-12912 to monitor, as the patch is not yet implemented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TRAFODION-1427) List of potential new features in HBase to test and profile performance on.
Eric Owhadi created TRAFODION-1427: -- Summary: List of potential new features in HBase to test and profile performance on. Key: TRAFODION-1427 URL: https://issues.apache.org/jira/browse/TRAFODION-1427 Project: Apache Trafodion Issue Type: Umbrella Components: documentation Affects Versions: 1.1 (pre-incubation), 2.0-incubating Reporter: Eric Owhadi This JIRA is about keeping track of existing or future HBase configuration setting that could impact performance and record findings -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TRAFODION-1420) Use ClientSmallScanner for small scans to improve performance
[ https://issues.apache.org/jira/browse/TRAFODION-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14643467#comment-14643467 ] Eric Owhadi commented on TRAFODION-1420: Qifan, your recall that small scan was enabled by default: REading the code, I don't see how, BUT: I do see that there is a JIRA on HBase (HBAse 12411 Optionally enable p-reads and private readers for compaction) that is about changing the behavior of the scanner globally to always use pread instead of seek/read (so behave like small scanner except for the RPC count optimization), and it comes hand in hand with compaction that can optionally use private readers (so you can still use seek/read for compaction). On paper, that looks like the way to go for trafodion high concurrency workload, since pread is stateless and no locking happen. configuration: hbase.storescanner.use.pread on and hbase.regionserver.compaction.private.readers on. Is that what you recall as "small scanner being already on in Trafodion"? > Use ClientSmallScanner for small scans to improve performance > - > > Key: TRAFODION-1420 > URL: https://issues.apache.org/jira/browse/TRAFODION-1420 > Project: Apache Trafodion > Issue Type: Improvement > Components: sql-cmp, sql-exe >Reporter: Eric Owhadi > Labels: performance > Fix For: 2.0-incubating > > > Hbase implements an optimization for small scan (defined as scanning less > than a data block ie 64Kb) resulting in 3X performance improvement. The > underlying trick is about cutting down RPC calls from 3 (OpenScan/Next/Close) > to 1, and use pread stateless instead of seek/read state-full and locking > method to read data. This JIRA is about improving the compiler who can be > aware if a scan will be acting on single data block (small) or not, and pass > this data to executor so that it can use the right parameter for scan. > (scan.setSmall(boolean)). > reference: > https://issues.apache.org/jira/browse/HBASE-9488 > https://issues.apache.org/jira/browse/HBASE-7266 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TRAFODION-1421) Implement parallel Scanner primitive
[ https://issues.apache.org/jira/browse/TRAFODION-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14643324#comment-14643324 ] Eric Owhadi commented on TRAFODION-1421: Anoop, you are talking about "merging sorted streams": In what I was going to implement the stream seen by ESP or Master executor would not be multiple streams, but a single stream of unsorted data (not random data, but intermingle of multiple regions scanned in parallel data in a single stream. So for operators that needs sorted stream, that parallel scanner would not be appropriate. Hope this is still useful ? I guess it is since you would get multi-threading parallelism on top of ESP (multi process parallelism)? > Implement parallel Scanner primitive > > > Key: TRAFODION-1421 > URL: https://issues.apache.org/jira/browse/TRAFODION-1421 > Project: Apache Trafodion > Issue Type: Improvement > Components: sql-cmp, sql-exe >Reporter: Eric Owhadi >Assignee: Eric Owhadi > Labels: performance > Fix For: 2.0-incubating > > > ClientScanner API is serial, to conserve key ordering. However, many > operators don't care about ordering and would rather get the scan result > fast, regardless of order. This JIRA is about providing a parallel scanner, > that would take care of splitting the work between all region servers evenly > if possible. HBase had a parallel scanner in the pipe for quite some time > HBAse-9272, but the work is stalled since october 2013. However, looking at > the available code, look like a big part can be leveraged without requiring > an HBase custom build. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TRAFODION-1419) Add support for multiple column families in a trafodion table
[ https://issues.apache.org/jira/browse/TRAFODION-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14642696#comment-14642696 ] Eric Owhadi commented on TRAFODION-1419: great, that is what I was thinking would logically happen. The restriction is temporary until hybrid and/or multiple align sub rows features shows up. > Add support for multiple column families in a trafodion table > - > > Key: TRAFODION-1419 > URL: https://issues.apache.org/jira/browse/TRAFODION-1419 > Project: Apache Trafodion > Issue Type: New Feature >Reporter: Anoop Sharma >Assignee: Anoop Sharma > > This proposal is to add support for multiple column families in trafodion > tables. With this feature, one can store columns into multiple column > families. One use for this would be to store frequently used columns in one > column family and infrequently used columns to be stored in a different > column family. That will have performance improvement when those columns are > retrieved from hbase. There could be other uses as well. > Syntax: > create table ( . , > . ….) > attributes default column family ; > alter table add column . datatype; > : name of column family for that column > Semantics: > name follows identifier rules. If not double quoted, then it > will be upper cased. If double quoted, then case will be maintained. > User specified column family can be of arbitrary length. To optimize > space for column family stored in a cell, a 2 byte encoding is generated. > Mapping of user specified column family to encoded column family is stored in > metadata. > If no column family is specified for a column during create table, then > the family specified in ‘attributes default column family’ clause is used. > If no ‘attribute default column family’ clause is specified , then system > default col family is used. > column family specification is supported for regular and volatile > tables. > all unique column families specified during create or alter are added > to the table > maximum number of column families supported in one table is 32. But it > is hbase recommendation to not create too many column families. > alter statement can be used to assign specific hbase options to > specific column families > using the NAME clause. If no name clause is specified, then alter hbase > options are applied > to all col families. > invoke and showddl statements will show the original user specified > column families and not the encoded column families > Currently, multiple column families are not supported for columns of a > user created or an implicitly created index. > The default column family of the corresponding base table is used for all > index columns. > column family cannot be specified in a DML query > column family cannot be specified for columns of an aligned row format > table since all columns are stored as one cell > Column names must be unique for each table. The same column name cannot > be used as part of multiple column families. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TRAFODION-1419) Add support for multiple column families in a trafodion table
[ https://issues.apache.org/jira/browse/TRAFODION-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641000#comment-14641000 ] Eric Owhadi commented on TRAFODION-1419: In the future, are we going to support column family with align row format? The restriction is just to not boil ocean in one shot? Column family support is a great feature and give back some "columnar" advantages to trafodion. Cool stuff. > Add support for multiple column families in a trafodion table > - > > Key: TRAFODION-1419 > URL: https://issues.apache.org/jira/browse/TRAFODION-1419 > Project: Apache Trafodion > Issue Type: New Feature >Reporter: Anoop Sharma >Assignee: Anoop Sharma > > This proposal is to add support for multiple column families in trafodion > tables. With this feature, one can store columns into multiple column > families. One use for this would be to store frequently used columns in one > column family and infrequently used columns to be stored in a different > column family. That will have performance improvement when those columns are > retrieved from hbase. There could be other uses as well. > Syntax: > create table ( . , > . ….) > attributes default column family ; > alter table add column . datatype; > : name of column family for that column > Semantics: > name follows identifier rules. If not double quoted, then it > will be upper cased. If double quoted, then case will be maintained. > User specified column family can be of arbitrary length. To optimize > space for column family stored in a cell, a 2 byte encoding is generated. > Mapping of user specified column family to encoded column family is stored in > metadata. > If no column family is specified for a column during create table, then > the family specified in ‘attributes default column family’ clause is used. > If no ‘attribute default column family’ clause is specified , then system > default col family is used. > column family specification is supported for regular and volatile > tables. > all unique column families specified during create or alter are added > to the table > maximum number of column families supported in one table is 32. But it > is hbase recommendation to not create too many column families. > alter statement can be used to assign specific hbase options to > specific column families > using the NAME clause. If no name clause is specified, then alter hbase > options are applied > to all col families. > invoke and showddl statements will show the original user specified > column families and not the encoded column families > Currently, multiple column families are not supported for columns of a > user created or an implicitly created index. > The default column family of the corresponding base table is used for all > index columns. > column family cannot be specified in a DML query > column family cannot be specified for columns of an aligned row format > table since all columns are stored as one cell > Column names must be unique for each table. The same column name cannot > be used as part of multiple column families. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TRAFODION-1422) Delete column can be dramatically improved (ALTER statement)
[ https://issues.apache.org/jira/browse/TRAFODION-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14640976#comment-14640976 ] Eric Owhadi commented on TRAFODION-1422: previous discussions before JIRA was open: === On Wed, Jul 22, 2015 at 9:55 PM, Selva Govindarajan < selva.govindara...@esgyn.com> wrote: > I was referring to drop column scenario. In Trafodion, we delete all > cells given a rowid except for the drop column scenario. Hence > deleteRows is not sending the columns parameter. The deleteRow had > column parameter to take care of drop column scenario. So, if we > choose any of three options mentioned, we can remove the column > parameter in deleteRow and introduce the needed new methods. > > Yes. At least for this case it is possible to create multiple threads > to scan and delete in parallel. However HTable/RMInterface object is > not thread safe, So we might need to create as many HTable/RMInterface > objects as the number of threads and ensure it is transactional too. > > > > On Wed, Jul 22, 2015 at 4:09 PM, Eric Owhadi > wrote: > > > Not sure I understand, all this to improve the drop column scenario > > that > we > > have considered not important? > > Or are you thinking of another delete scenario? > > > > If we want to optimize further by doing parallel plan, I don't think > > optimizer is needed. By using the same mechanism that I am planning > > for ParallelScan that will multi thread by region, load balancing on > > regions servers, just altering the parallelScan to issue a delete > > Rows on each thread to unsure that rows to delete on a multiple > > delete are from same region. > > > > I agree that coproc would be the fastest method, but is it worth > > going > that > > route given the limited scenario? > > > > Eric > > > > -Original Message- > > From: Selva Govindarajan [mailto:selva.govindara...@esgyn.com] > > Sent: Wednesday, July 22, 2015 5:17 PM > > To: d...@trafodion.incubator.apache.org > > Subject: RE: optimization of deleteColumns? > > > > Please give consideration to these options to improve the > > performance > for > > this scenario > > > > 1) Move the implementation to Java to > > - Reduce JNI to java transitions > > - Enable multiple deletes > > 2) Use co-processor to delete > > 3) Introduce a SQL command like DELETE FROM > > > and > > teach optimizer to do parallel plan and use rowset to delete the > > column value. > > > > Selva > > > > > > > > -Original Message- > > From: Eric Owhadi [mailto:eric.owh...@esgyn.com] > > Sent: Wednesday, July 22, 2015 3:28 AM > > To: d...@trafodion.incubator.apache.org > > Subject: Re: optimization of deleteColumns? > > > > Actually, looking at the code further, I believe that there is an > > even > more > > important possible improvement (I would guess at least 10 time more > > important than the KeyOnlyFilter trick): > > The code is using looping and triggering single delete instead of > > doing batch deletes. The reason being that existing deleteRows does > > not take columns as parameter. But we could alter it to add it. This > > would make cleaner API and allow doing this optimization. I > > understand that these ALTER stuff are not frequently used, but I can > > imagine that the DBA doing > schema > > changes on a database with millions of records might not appreciate > > it if it takes too long to drop a column? > > Should we improve? Should I create a JIRA, even if we decide not to > > work > on > > it to document the potential improvement?If you think it is worth > > it, I > can > > assign it to myself as a learning exercise to see if I can go to the > > full process? > > Eric > > > > > > On Tue, Jul 21, 2015 at 11:39 PM, Anoop Sharma > > > > wrote: > > > > > Yes, Selva is right. This code is used to delete the specified > > > column from all rows of a table if that column exists. > > > This is done as part of 'alter table drop column' command. > > > > > > The specified column is removed from metadata and then from the table. > > > For correctness of just the drop command, one can remove that > > > column from metadata and not remove it from the actual hbase table. > > > This would work since referencing that column in a query will > > > return an error during compile time and one will never reach the > > > point of selecting it from the table. > > > However, if a column is later added with the same name, then > > > incorrect results will be returned due to existing column values > > > that were not deleted during the drop command. > > > > > > anoop > > > > > > -Original Message- > > > From: Selva Govindarajan [mailto:selva.govindara...@esgyn.com] > > > Sent: Tuesday, July 21, 2015 8:47 PM > > > To: d...@trafodion.incubator.apache.org > > > Subject: RE: optimization of deleteColumns? > > > > > > Hi Eric, > > > >
[jira] [Created] (TRAFODION-1422) Delete column can be dramatically improved (ALTER statement)
Eric Owhadi created TRAFODION-1422: -- Summary: Delete column can be dramatically improved (ALTER statement) Key: TRAFODION-1422 URL: https://issues.apache.org/jira/browse/TRAFODION-1422 Project: Apache Trafodion Issue Type: Improvement Reporter: Eric Owhadi Priority: Minor Fix For: 2.0-incubating The current code path for delete column has not been optimized and can be greatly improved. See comments bellow for many ways to implement optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TRAFODION-1421) Implement parallel Scanner primitive
[ https://issues.apache.org/jira/browse/TRAFODION-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14640962#comment-14640962 ] Eric Owhadi commented on TRAFODION-1421: previous discussion before opening JIRA: If data is needed in sorted order for an order by clause or for a merge join, then optimizer chooses or can potentially choose a plan that will ensure sorted order. This could be done either by reading data in key order if only one partition is being read, or reading data from multiple partitions sequentially if data order is preserved across multiple partitions, or by doing a merge of multiple streams/partitions where each partition is returning data in sorted order, or by doing an external sort on returned data from each partitions and then merging them, if needed. Traf opt may or may not be doing all of this at this point. If an ESP is reading data from multiple partitions/regions, and parallel asynchronous functionality is added at ESP level (this will be similar to the PAPA (parallel access partition access) node in the early implementation), then need to make sure that optimizer is aware of this runtime functionality and chooses appropriate plan by merging sorted streams. anoop -Original Message- From: Eric Owhadi [mailto:eric.owh...@esgyn.com] Sent: Wednesday, July 22, 2015 11:57 AM To: d...@trafodion.incubator.apache.org Subject: Parallel scanner? Hi All, I have been looking at how we currently use the scanner. Look like it should be not too difficult to inject a parallel scanner instead of the default serial scanner since in many use cases we don't care about the ordering of the data retrieved. Key question: do we sometime take advantage of the ordering (to do stuff like merges) or are these merges requiring sorting are anyway always at the ESP level? The question is to know if we should have optional serial scanner or parallel scanner (one with sorting preserved, the other not) or if we could always enable parallel scanner? On implementation details, we can do sophisticated algorithm to preserve thread resources and auto scale the parallelism based on the speed of consumption of the code doing next(), or we can simply always go with as many thread as there is regions to scan, accepting the fact that some thread will wait() if client next() code is not consuming fast enough. I can prototype the simple one, then move to the auto scaling of thread once done. The reason I need to know if we should keep the serial scanner path is to know if I should create a whole new wiring for parallel scanner, or if I can just replace the serial scanner with the parallel one (just enabling one or the other at config time just for bench-marking purpose). Anybody working on this already, or should I give it a try? Regards, Eric > Implement parallel Scanner primitive > > > Key: TRAFODION-1421 > URL: https://issues.apache.org/jira/browse/TRAFODION-1421 > Project: Apache Trafodion > Issue Type: Improvement > Components: sql-cmp, sql-exe >Reporter: Eric Owhadi >Assignee: Eric Owhadi > Labels: performance > Fix For: 2.0-incubating > > > ClientScanner API is serial, to conserve key ordering. However, many > operators don't care about ordering and would rather get the scan result > fast, regardless of order. This JIRA is about providing a parallel scanner, > that would take care of splitting the work between all region servers evenly > if possible. HBase had a parallel scanner in the pipe for quite some time > HBAse-9272, but the work is stalled since october 2013. However, looking at > the available code, look like a big part can be leveraged without requiring > an HBase custom build. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (TRAFODION-1421) Implement parallel Scanner primitive
[ https://issues.apache.org/jira/browse/TRAFODION-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on TRAFODION-1421 started by Eric Owhadi. -- > Implement parallel Scanner primitive > > > Key: TRAFODION-1421 > URL: https://issues.apache.org/jira/browse/TRAFODION-1421 > Project: Apache Trafodion > Issue Type: Improvement > Components: sql-cmp, sql-exe >Reporter: Eric Owhadi >Assignee: Eric Owhadi > Labels: performance > Fix For: 2.0-incubating > > > ClientScanner API is serial, to conserve key ordering. However, many > operators don't care about ordering and would rather get the scan result > fast, regardless of order. This JIRA is about providing a parallel scanner, > that would take care of splitting the work between all region servers evenly > if possible. HBase had a parallel scanner in the pipe for quite some time > HBAse-9272, but the work is stalled since october 2013. However, looking at > the available code, look like a big part can be leveraged without requiring > an HBase custom build. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TRAFODION-1421) Implement parallel Scanner primitive
Eric Owhadi created TRAFODION-1421: -- Summary: Implement parallel Scanner primitive Key: TRAFODION-1421 URL: https://issues.apache.org/jira/browse/TRAFODION-1421 Project: Apache Trafodion Issue Type: Improvement Components: sql-cmp, sql-exe Reporter: Eric Owhadi Fix For: 2.0-incubating ClientScanner API is serial, to conserve key ordering. However, many operators don't care about ordering and would rather get the scan result fast, regardless of order. This JIRA is about providing a parallel scanner, that would take care of splitting the work between all region servers evenly if possible. HBase had a parallel scanner in the pipe for quite some time HBAse-9272, but the work is stalled since october 2013. However, looking at the available code, look like a big part can be leveraged without requiring an HBase custom build. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TRAFODION-1420) Use ClientSmallScanner for small scans to improve performance
[ https://issues.apache.org/jira/browse/TRAFODION-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14640947#comment-14640947 ] Eric Owhadi commented on TRAFODION-1420: previous discussion on dev list before JIRA creation for reference: what is codeGen, and do you know of a global setting to change the default? Looking at source code I don't see where the default scan setting for small can be anything but false? Also looking at the JNI API we have to trigger scan, there is no param that I can see to play with the small param, so I don't see how with this API stuff could be turned on or off? Let me see about JIRA creation, that would be my first, so need to learn :-) Eric On Fri, Jul 24, 2015 at 1:31 PM, Qifan Chen wrote: During scan improvement work, I recall small scan is turned on by default by hbase and should be turned off for large scans. I wonder if we can confirm that first and how does the flag is set during codeGen phase. This will make JIRA more accurate. Secondly, welcome to the sql component and I am more than happy to provide any help in compiler area. Thanks -Qifan Sent from my iPhone > On Jul 24, 2015, at 12:36 PM, Eric Owhadi wrote: > > oh, I see, > Eric > > On Fri, Jul 24, 2015 at 12:34 PM, Carol Pearson > wrote: > >> I saw that. I was more responding to your comment about metadata only being >> read once, so not as necessary to optimize. That's a slightly different >> tangent, but one that gets overlooked at times. Startup time to first >> select is a key metric for a high-performance database, both for initial >> install/setup/upgrade and on a simple restart. >> >> -Carol P. >> >> On Fri, Jul 24, 2015 at 10:30 AM, Eric Owhadi >> wrote: >> >>> Anoop is suggesting to use this not only for Meta data, but for any query >>> where compiler evaluate that it would be appropriate to turn the feature >>> on. >>> Eric >>> >>> On Fri, Jul 24, 2015 at 12:26 PM, Carol Pearson < >>> carol.pearson...@gmail.com> >>> wrote: >>> If it only happens once, does that mean that this optimization might >> be a good one at startup time? If one of the failure modes is to bounce something, or our users are simply restarting after some sort of maintenance, startup would hit a lot of metadata all at once Thanks, -Carol P. On Fri, Jul 24, 2015 at 9:57 AM, Eric Owhadi wrote: > more reading: > https://issues.apache.org/jira/browse/HBASE-9488 > > turns out that the performance improvement comes from 2 reasons: the >> 3 RPC > collapsed to 1, and the use of pread instead of seek+read. > see facebook branch : >> https://issues.apache.org/jira/browse/HBASE-7266 > they have enabled pread all the way, even for long scan using other > features to implement prefetch needed for long scan. > > I incorrectly stated that the criteria was if your result set fit in >>> your > cache size. reading deeper, look like the criteria should be: > if the scan range is within one datablock (64K) then we should set >>> small > scan. > > looking at the code, I seams that if you incorrectly set small on >> non > small, you just have incorrectly optimized. Will work slower... > There were some bugs in early implementation not supporting well scan > ranges crossing regions, but i see that the patch correcting them are >>> in > the branch of HBase we use. > > and yes the optimization works on the 2 cases you mention. > > on what I am observing, I only saw the traffic on first invoke, I >> have not > tried to observe what happen on several run. So nothing to worry >> about. > > Given the performance boost on small scan (3X), I think what you >>> propose > "We > do have estimates of accessed rows at compile time and could turn >> this opt > on, if rows are small." should be good candidate to add in the list >> of > stuff to do to improve perf... > > Eric > > > > On Fri, Jul 24, 2015 at 9:29 AM, Anoop Sharma < >> anoop.sha...@esgyn.com> > wrote: > >> There are 2 kind of scans that are done. One is a unique scan where >>> we > know >> that only one unique row or a set of unique rows will be returned. >> And second is a non-unique scan where multiple rows are returned. >> >> Does this optimization apply to both of these cases? >> >> We do have estimates of accessed rows at compile time and could >> turn this >> opt on, if rows are small. >> What happens if this flag is set and the scan is not small or >> doesn't fit >> in >> the cache? Will that work with some perf degradation or will it >> fail? >> >> We only read metadata information from hbase when the table is used >>> for > the >> fir
[jira] [Created] (TRAFODION-1420) Use ClientSmallScanner for small scans to improve performance
Eric Owhadi created TRAFODION-1420: -- Summary: Use ClientSmallScanner for small scans to improve performance Key: TRAFODION-1420 URL: https://issues.apache.org/jira/browse/TRAFODION-1420 Project: Apache Trafodion Issue Type: Improvement Components: sql-cmp, sql-exe Reporter: Eric Owhadi Fix For: 2.0-incubating Hbase implements an optimization for small scan (defined as scanning less than a data block ie 64Kb) resulting in 3X performance improvement. The underlying trick is about cutting down RPC calls from 3 (OpenScan/Next/Close) to 1, and use pread stateless instead of seek/read state-full and locking method to read data. This JIRA is about improving the compiler who can be aware if a scan will be acting on single data block (small) or not, and pass this data to executor so that it can use the right parameter for scan. (scan.setSmall(boolean)). reference: https://issues.apache.org/jira/browse/HBASE-9488 https://issues.apache.org/jira/browse/HBASE-7266 -- This message was sent by Atlassian JIRA (v6.3.4#6332)