[jira] [Resolved] (PHOENIX-7253) Metadata lookup performance improvement for range scan queries

2024-03-15 Thread Viraj Jasani (Jira)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Jasani resolved PHOENIX-7253.
---
Resolution: Fixed

> Metadata lookup performance improvement for range scan queries
> --
>
> Key: PHOENIX-7253
> URL: https://issues.apache.org/jira/browse/PHOENIX-7253
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.2.0, 5.1.3
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Critical
> Fix For: 5.2.0, 5.1.4
>
>
> Any considerably large table with more than 100k regions can give problematic 
> performance if we access all region locations from meta for the given table 
> before generating parallel or sequential scans for the given query. The perf 
> impact can really hurt range scan queries.
> Consider a table with hundreds of thousands of tenant views. Unless the query 
> is strict point lookup, any query on any tenant view would end up retrieving 
> region locations of all regions of the base table. In case if IOException is 
> thrown by HBase client during any region location lookup in meta, we only 
> perform single retry.
> Proposal:
>  # All non point lookup queries should only retrieve region locations that 
> cover the scan boundary. Avoid fetching all region locations of the base 
> table.
>  # Make retries configurable with higher default value.
>  
> The proposal should improve the performance of queries:
>  * Range Scan
>  * Range scan on Salted table
>  * Range scan on Salted table with Tenant id and/or View index id
>  * Range Scan on Tenant connection
>  * Full Scan on Tenant connection
> Here, full scan on tenant connection is always of type "range scan" for the 
> base table.
> Sample stacktrace from the multiple failures observed:
> {code:java}
> java.sql.SQLException: ERROR 1102 (XCL02): Cannot get all table regions.Stack 
> trace: java.sql.SQLException: ERROR 1102 (XCL02): Cannot get all table 
> regions.
>     at 
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:620)
>     at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:229)
>     at 
> org.apache.phoenix.query.ConnectionQueryServicesImpl.getAllTableRegions(ConnectionQueryServicesImpl.java:781)
>     at 
> org.apache.phoenix.query.DelegateConnectionQueryServices.getAllTableRegions(DelegateConnectionQueryServices.java:87)
>     at 
> org.apache.phoenix.query.DelegateConnectionQueryServices.getAllTableRegions(DelegateConnectionQueryServices.java:87)
>     at 
> org.apache.phoenix.iterate.DefaultParallelScanGrouper.getRegionBoundaries(DefaultParallelScanGrouper.java:74)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.getRegionBoundaries(BaseResultIterators.java:587)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:936)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:669)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.(BaseResultIterators.java:555)
>     at 
> org.apache.phoenix.iterate.SerialIterators.(SerialIterators.java:69)
>     at org.apache.phoenix.execute.ScanPlan.newIterator(ScanPlan.java:278)
>     at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:374)
>     at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:222)
>     at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:217)
>     at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:212)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:370)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:328)
>     at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:328)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:320)
>     at 
> org.apache.phoenix.jdbc.PhoenixPreparedStatement.executeQuery(PhoenixPreparedStatement.java:188)
>     ...
>     ...
>     Caused by: java.io.InterruptedIOException: Origin: InterruptedException
>         at 
> org.apache.hadoop.hbase.util.ExceptionUtil.asInterrupt(ExceptionUtil.java:72)
>         at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.takeUserRegionLock(ConnectionImplementation.java:1129)
>         at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:994)
>         at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:895)
>         at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:881)

[jira] [Updated] (PHOENIX-7281) Test failure LogicalTableNameExtendedIT#testUpdatePhysicalTableName_tenantViews

2024-03-15 Thread Rushabh Shah (Jira)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-7281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh Shah updated PHOENIX-7281:
--
Summary: Test failure 
LogicalTableNameExtendedIT#testUpdatePhysicalTableName_tenantViews  (was: TF 
LogicalTableNameExtendedIT#testUpdatePhysicalTableName_tenantViews)

> Test failure 
> LogicalTableNameExtendedIT#testUpdatePhysicalTableName_tenantViews
> ---
>
> Key: PHOENIX-7281
> URL: https://issues.apache.org/jira/browse/PHOENIX-7281
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Rushabh Shah
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PHOENIX-7281) TF LogicalTableNameExtendedIT#testUpdatePhysicalTableName_tenantViews

2024-03-15 Thread Rushabh Shah (Jira)
Rushabh Shah created PHOENIX-7281:
-

 Summary: TF 
LogicalTableNameExtendedIT#testUpdatePhysicalTableName_tenantViews
 Key: PHOENIX-7281
 URL: https://issues.apache.org/jira/browse/PHOENIX-7281
 Project: Phoenix
  Issue Type: Sub-task
Reporter: Rushabh Shah






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PHOENIX-7280) Test failure: ViewMetadataIT#testViewAndTableAndDropCascadeWithIndexes

2024-03-15 Thread Rushabh Shah (Jira)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh Shah updated PHOENIX-7280:
--
Description: 
Test failure: 
https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-1778/56/testReport/junit/org.apache.phoenix.end2end/ViewMetadataIT/testViewAndTableAndDropCascadeWithIndexes/
The test is doing following:
1. Create a data table
2. Create 2 views on the data table.
3. Create 3 indexes, one on data table and 2 indexes on 2 views.
4. Drop data table with CASCADE option.
5. Run drop child views task.
6. Validate view1 and view2 doesn't exist.

The test is failing in step5 while dropping view view2.

It fails with the following error while doing getTable call on the base table.
{noformat}
2024-03-15T11:58:51,682 ERROR 
[RpcServer.Metadata.Fifo.handler=3,queue=0,port=61097] 
coprocessor.MetaDataEndpointImpl(715): getTable failed
java.lang.IllegalArgumentException: offset (0) must be < array length (0)
at 
org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkArgument(Preconditions.java:302)
 ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:408) 
~[hbase-common-2.5.7-hadoop3.jar:2.5.7-hadoop3]
at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:395) 
~[hbase-common-2.5.7-hadoop3.jar:2.5.7-hadoop3]
at 
org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:675)
 ~[classes/:?]
at 
org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:17524)
 ~[classes/:?]
at 
org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7930) 
~[hbase-server-2.5.7-hadoop3.jar:2.5.7-hadoop3]
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:2535)
 ~[hbase-server-2.5.7-hadoop3.jar:2.5.7-hadoop3]
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2509)
 ~[hbase-server-2.5.7-hadoop3.jar:2.5.7-hadoop3]
at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45014)
 ~[hbase-protocol-shaded-2.5.7-hadoop3.jar:2.5.7-hadoop3]
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:415) 
~[hbase-server-2.5.7-hadoop3.jar:?]
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) 
~[hbase-server-2.5.7-hadoop3.jar:?]
at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102) 
~[hbase-server-2.5.7-hadoop3.jar:?]
at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82) 
~[hbase-server-2.5.7-hadoop3.jar:?]
{noformat}

 The base table is already dropped at step#4 and MDEI will cache the Deleted 
Table Marker in its cache. See 
[here|https://github.com/apache/phoenix/blob/PHOENIX-6883-feature/phoenix-core-server/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java#L2829-L2831]
 for more details.

{code}
long currentTime = 
MetaDataUtil.getClientTimeStamp(tableMetadata);
for (ImmutableBytesPtr ckey : invalidateList) {
metaDataCache.put(ckey, newDeletedTableMarker(currentTime));
}
{code}

DeletedTableMarker is an empty PTable object. See 
[here|https://github.com/apache/phoenix/blob/PHOENIX-6883-feature/phoenix-core-server/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java#L1870-L1881]
 for the definition.

{code}
private static PTable newDeletedTableMarker(long timestamp) {
try {
return new PTableImpl.Builder()
.setType(PTableType.TABLE)
.setTimeStamp(timestamp)
.setPkColumns(Collections.emptyList())
.setAllColumns(Collections.emptyList())
.setFamilyAttributes(Collections.emptyList())
.setRowKeySchema(RowKeySchema.EMPTY_SCHEMA)
.setIndexes(Collections.emptyList())
.setPhysicalNames(Collections.emptyList())
.build();
} catch (SQLException e) {
// Should never happen
return null;
}
}
{code}


Now while dropping view2, it is not able to find the view2 in its cache so it 
is trying to construct the view.
It has to scan the SYSCAT regionserver to construct the view.
It calls the method 
[getTableFromCells|https://github.com/apache/phoenix/blob/PHOENIX-6883-feature/phoenix-core-server/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java#L1089]

Since this is a view, it will have LINK_TYPE = 2 in the view's header row which 
will link it to the physical table. It will go through the code 

[jira] [Created] (PHOENIX-7280) Test failure: ViewMetadataIT#testViewAndTableAndDropCascadeWithIndexes

2024-03-15 Thread Rushabh Shah (Jira)
Rushabh Shah created PHOENIX-7280:
-

 Summary: Test failure: 
ViewMetadataIT#testViewAndTableAndDropCascadeWithIndexes
 Key: PHOENIX-7280
 URL: https://issues.apache.org/jira/browse/PHOENIX-7280
 Project: Phoenix
  Issue Type: Sub-task
Reporter: Rushabh Shah


Test failure: 
https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-1778/56/testReport/junit/org.apache.phoenix.end2end/ViewMetadataIT/testViewAndTableAndDropCascadeWithIndexes/
The test is doing following:
1. Create a data table
2. Create 2 views on the data table.
3. Create 3 indexes, one on data table and 2 indexes on 2 views.
4. Drop data table with CASCADE option.
5. Run drop child views task.
6. Validate view1 and view2 doesn't exist.

The test is failing in step5 while dropping view view2.

It fails with the following error while doing getTable call on the base table.
{noformat}
2024-03-15T11:58:51,682 ERROR 
[RpcServer.Metadata.Fifo.handler=3,queue=0,port=61097] 
coprocessor.MetaDataEndpointImpl(715): getTable failed
java.lang.IllegalArgumentException: offset (0) must be < array length (0)
at 
org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkArgument(Preconditions.java:302)
 ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:408) 
~[hbase-common-2.5.7-hadoop3.jar:2.5.7-hadoop3]
at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:395) 
~[hbase-common-2.5.7-hadoop3.jar:2.5.7-hadoop3]
at 
org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:675)
 ~[classes/:?]
at 
org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:17524)
 ~[classes/:?]
at 
org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7930) 
~[hbase-server-2.5.7-hadoop3.jar:2.5.7-hadoop3]
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:2535)
 ~[hbase-server-2.5.7-hadoop3.jar:2.5.7-hadoop3]
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2509)
 ~[hbase-server-2.5.7-hadoop3.jar:2.5.7-hadoop3]
at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45014)
 ~[hbase-protocol-shaded-2.5.7-hadoop3.jar:2.5.7-hadoop3]
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:415) 
~[hbase-server-2.5.7-hadoop3.jar:?]
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) 
~[hbase-server-2.5.7-hadoop3.jar:?]
at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102) 
~[hbase-server-2.5.7-hadoop3.jar:?]
at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82) 
~[hbase-server-2.5.7-hadoop3.jar:?]
{noformat}

 The base table is already dropped at step#4 and MDEI will cache the Deleted 
Table Marker in its cache. See 
[here|https://github.com/apache/phoenix/blob/PHOENIX-6883-feature/phoenix-core-server/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java#L2829-L2831]
 for more details.

{code}
long currentTime = 
MetaDataUtil.getClientTimeStamp(tableMetadata);
for (ImmutableBytesPtr ckey : invalidateList) {
metaDataCache.put(ckey, newDeletedTableMarker(currentTime));
}
{code}

DeletedTableMarker is an empty PTable object. See 
[here|https://github.com/apache/phoenix/blob/PHOENIX-6883-feature/phoenix-core-server/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java#L1870-L1881]
 for the definition.

{code}
private static PTable newDeletedTableMarker(long timestamp) {
try {
return new PTableImpl.Builder()
.setType(PTableType.TABLE)
.setTimeStamp(timestamp)
.setPkColumns(Collections.emptyList())
.setAllColumns(Collections.emptyList())
.setFamilyAttributes(Collections.emptyList())
.setRowKeySchema(RowKeySchema.EMPTY_SCHEMA)
.setIndexes(Collections.emptyList())
.setPhysicalNames(Collections.emptyList())
.build();
} catch (SQLException e) {
// Should never happen
return null;
}
}
{code}


Now while dropping view2, it is not able to find the view2 in its cache so it 
is trying to construct the view.
It has to scan the SYSCAT regionserver to construct the view.
It calls the method 
[getTableFromCells|https://github.com/apache/phoenix/blob/PHOENIX-6883-feature/phoenix-core-server/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java#L1089]

Since this is a view, it will have LINK_TYPE = 2 in the view's header row which 
will link it to the physical 

[jira] [Updated] (PHOENIX-7279) column not found exception when aliased column used in order by of union all query and first query in it also aliased

2024-03-15 Thread Rajeshbabu Chintaguntla (Jira)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajeshbabu Chintaguntla updated PHOENIX-7279:
-
Priority: Critical  (was: Major)

> column not found exception when aliased column used in order by of union all 
> query and first query in it also aliased
> -
>
> Key: PHOENIX-7279
> URL: https://issues.apache.org/jira/browse/PHOENIX-7279
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Rajeshbabu Chintaguntla
>Assignee: Rajeshbabu Chintaguntla
>Priority: Critical
> Fix For: 5.2.0, 5.3.0, 5.1.4
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PHOENIX-7279) column not found exception when aliased column used in order by of union all query and first query in it also aliased

2024-03-15 Thread Rajeshbabu Chintaguntla (Jira)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajeshbabu Chintaguntla updated PHOENIX-7279:
-
Fix Version/s: 5.2.0
   5.3.0
   5.1.4

> column not found exception when aliased column used in order by of union all 
> query and first query in it also aliased
> -
>
> Key: PHOENIX-7279
> URL: https://issues.apache.org/jira/browse/PHOENIX-7279
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Rajeshbabu Chintaguntla
>Assignee: Rajeshbabu Chintaguntla
>Priority: Major
> Fix For: 5.2.0, 5.3.0, 5.1.4
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PHOENIX-7279) column not found exception when aliased column used in order by of union all query and first query in it also aliased

2024-03-15 Thread Rajeshbabu Chintaguntla (Jira)
Rajeshbabu Chintaguntla created PHOENIX-7279:


 Summary: column not found exception when aliased column used in 
order by of union all query and first query in it also aliased
 Key: PHOENIX-7279
 URL: https://issues.apache.org/jira/browse/PHOENIX-7279
 Project: Phoenix
  Issue Type: Bug
Reporter: Rajeshbabu Chintaguntla
Assignee: Rajeshbabu Chintaguntla






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (PHOENIX-7270) Always resolve table before DDL operations.

2024-03-15 Thread Rushabh Shah (Jira)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-7270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh Shah resolved PHOENIX-7270.
---
Resolution: Fixed

> Always resolve table before DDL operations.
> ---
>
> Key: PHOENIX-7270
> URL: https://issues.apache.org/jira/browse/PHOENIX-7270
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Rushabh Shah
>Assignee: Palash Chauhan
>Priority: Major
>
> After we set the UCF = NEVER to all tables, we are validating last ddl 
> timestamps for read and write queries.
> For DDL operations, we are reading the PTable from the client side cache.
> In some cases, after the DDL operations we are updating/invalidating the 
> cache for the table which is being altered but we don't invalidate the cache 
> for the parent table (in case of views) or indexes. 
> When column encoding is set to true, we increment the seq number for base 
> physical table (for views) whenever we create a view. Refer 
> [here|https://github.com/apache/phoenix/blob/master/phoenix-core-client/src/main/java/org/apache/phoenix/schema/MetaDataClient.java#L2924-L2931]
>  for more details. Once the create view command is executed successfully, we 
> only add the view to the cache but we don't update the base table in the 
> cache. This can cause an inconsistency when we use the same cached PTable 
> object for next DDL operations on the base table.
> Solutions:
> 1. Validate last ddl timestamps for table, view hierarchy and indexes for 
> every DDL operations like we do for read and write queries.
> 2. Always resolve the table, view hierarchy and indexes for every DDL 
> operation. It will have the same effect as UCF is set to ALWAYS but just for 
> DDL operations.
> I would prefer option#2 since that will guarantee we always get the latest 
> Ptable object for DDL operations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PHOENIX-7278) Add support to dump BAD_ROWS in a directory for CsvBulkLoadTool

2024-03-15 Thread Nihal Jain (Jira)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nihal Jain updated PHOENIX-7278:

Labels: bulkload  (was: )

> Add support to dump BAD_ROWS in a directory for CsvBulkLoadTool 
> 
>
> Key: PHOENIX-7278
> URL: https://issues.apache.org/jira/browse/PHOENIX-7278
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Nihal Jain
>Assignee: Nihal Jain
>Priority: Major
>  Labels: bulkload
>
> CsvBulkLoadTool should have the functionality to dump BAD_ROWS into a 
> specified directory. This will enhance the tool's error handling capabilities 
> and provide users with a clear understanding of which rows have failed to 
> load during the bulk import process.
>  * CsvBulkLoadTool should have a feature to identify and isolate BAD_ROWS 
> during the bulk import process.
>  * The tool should provide an option for users to specify a directory where 
> these BAD_ROWS will be dumped.
>  * Upon execution, if there are any BAD_ROWS, the tool should create a file 
> in the specified directory containing these rows.
>  * The file should clearly indicate the reason for each row being labeled as 
> BAD_ROW, such as data inconsistency, format error, etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PHOENIX-7278) Add support to dump BAD_ROWS in a directory for CsvBulkLoadTool

2024-03-15 Thread Nihal Jain (Jira)
Nihal Jain created PHOENIX-7278:
---

 Summary: Add support to dump BAD_ROWS in a directory for 
CsvBulkLoadTool 
 Key: PHOENIX-7278
 URL: https://issues.apache.org/jira/browse/PHOENIX-7278
 Project: Phoenix
  Issue Type: Improvement
Reporter: Nihal Jain
Assignee: Nihal Jain


CsvBulkLoadTool should have the functionality to dump BAD_ROWS into a specified 
directory. This will enhance the tool's error handling capabilities and provide 
users with a clear understanding of which rows have failed to load during the 
bulk import process.
 * CsvBulkLoadTool should have a feature to identify and isolate BAD_ROWS 
during the bulk import process.

 * The tool should provide an option for users to specify a directory where 
these BAD_ROWS will be dumped.

 * Upon execution, if there are any BAD_ROWS, the tool should create a file in 
the specified directory containing these rows.

 * The file should clearly indicate the reason for each row being labeled as 
BAD_ROW, such as data inconsistency, format error, etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PHOENIX-7277) CsvBulkLoadTool should have an option to log error lines

2024-03-15 Thread Nihal Jain (Jira)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nihal Jain updated PHOENIX-7277:

Labels: bulkload  (was: )

> CsvBulkLoadTool should have an option to log error lines
> 
>
> Key: PHOENIX-7277
> URL: https://issues.apache.org/jira/browse/PHOENIX-7277
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Nihal Jain
>Assignee: Anchal Kejriwal
>Priority: Major
>  Labels: bulkload
>
> Similar to how hbase handles bad lines via 'importtsv.log.bad.lines', we 
> should add an optional option to log bad line for CsvBulkLoadTool



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PHOENIX-7277) CsvBulkLoadTool should have an option to log error lines

2024-03-15 Thread Nihal Jain (Jira)
Nihal Jain created PHOENIX-7277:
---

 Summary: CsvBulkLoadTool should have an option to log error lines
 Key: PHOENIX-7277
 URL: https://issues.apache.org/jira/browse/PHOENIX-7277
 Project: Phoenix
  Issue Type: Improvement
Reporter: Nihal Jain
Assignee: Anchal Kejriwal


Similar to how hbase handles bad lines via 'importtsv.log.bad.lines', we should 
add an optional option to log bad line for CsvBulkLoadTool



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (PHOENIX-7276) Add support for REGEXP_LIKE function

2024-03-15 Thread Nihal Jain (Jira)
Nihal Jain created PHOENIX-7276:
---

 Summary: Add support for REGEXP_LIKE function
 Key: PHOENIX-7276
 URL: https://issues.apache.org/jira/browse/PHOENIX-7276
 Project: Phoenix
  Issue Type: Improvement
Reporter: Nihal Jain
Assignee: Nihal Jain


Currently, Apache Phoenix supports a variety of built-in functions even few 
around regex, see 
https://phoenix.apache.org/language/functions.html#regexp_substr, but it lacks 
the REGEXP_LIKE function.

The REGEXP_LIKE function operates by comparing a string to a pattern. If the 
string matches the pattern, the function returns true; otherwise, it returns 
false. This functionality could be invaluable for many Phoenix users who need 
to filter or categorize their data based on specific patterns.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)