hudi-bot opened a new issue, #17309:
URL: https://github.com/apache/hudi/issues/17309
Though there is no read and write error, Spark SQL UPDATE and DELETE do not
write record positions to the log files.
{code:java}
spark-sql (default)> CREATE TABLE testing_positions.table2 (
> ts BIGINT,
> uuid STRING,
> rider STRING,
> driver STRING,
> fare DOUBLE,
> city STRING
> ) USING HUDI
> LOCATION
'file:///Users/ethan/Work/tmp/hudi-1.0.0-testing/positional/table2'
> TBLPROPERTIES (
> type = 'mor',
> primaryKey = 'uuid',
> preCombineField = 'ts'
> )
> PARTITIONED BY (city);
24/11/16 12:03:26 WARN TableSchemaResolver: Could not find any data file
written for commit, so could not get schema for table
file:/Users/ethan/Work/tmp/hudi-1.0.0-testing/positional/table2
Time taken: 0.4 seconds
spark-sql (default)> INSERT INTO testing_positions.table2
> VALUES
>
(1695159649087,'334e26e9-8355-45cc-97c6-c31daf0df330','rider-A','driver-K',19.10,'san_francisco'),
>
(1695091554788,'e96c4396-3fad-413a-a942-4cb36106d721','rider-C','driver-M',27.70
,'san_francisco'),
>
(1695046462179,'9909a8b1-2d15-4d3d-8ec9-efc48c536a00','rider-D','driver-L',33.90
,'san_francisco'),
>
(1695332066204,'1dced545-862b-4ceb-8b43-d2a568f6616b','rider-E','driver-O',93.50,'san_francisco'),
>
(1695516137016,'e3cf430c-889d-4015-bc98-59bdce1e530c','rider-F','driver-P',34.15,'sao_paulo'
),
>
(1695376420876,'7a84095f-737f-40bc-b62f-6b69664712d2','rider-G','driver-Q',43.40
,'sao_paulo' ),
>
(1695173887231,'3eeb61f7-c2b0-4636-99bd-5d7a5a1d2c04','rider-I','driver-S',41.06
,'chennai' ),
>
(1695115999911,'c8abbe79-8d89-47ea-b4ce-4d224bae5bfa','rider-J','driver-T',17.85,'chennai');
24/11/16 12:03:26 WARN TableSchemaResolver: Could not find any data file
written for commit, so could not get schema for table
file:/Users/ethan/Work/tmp/hudi-1.0.0-testing/positional/table2
24/11/16 12:03:26 WARN TableSchemaResolver: Could not find any data file
written for commit, so could not get schema for table
file:/Users/ethan/Work/tmp/hudi-1.0.0-testing/positional/table2
24/11/16 12:03:29 WARN log: Updating partition stats fast for: table2_ro
24/11/16 12:03:29 WARN log: Updated size to 436166
24/11/16 12:03:29 WARN log: Updating partition stats fast for: table2_ro
24/11/16 12:03:29 WARN log: Updating partition stats fast for: table2_ro
24/11/16 12:03:29 WARN log: Updated size to 436185
24/11/16 12:03:29 WARN log: Updated size to 436386
24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2_rt
24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2_rt
24/11/16 12:03:30 WARN log: Updated size to 436166
24/11/16 12:03:30 WARN log: Updated size to 436386
24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2_rt
24/11/16 12:03:30 WARN log: Updated size to 436185
24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2
24/11/16 12:03:30 WARN log: Updated size to 436166
24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2
24/11/16 12:03:30 WARN log: Updated size to 436386
24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2
24/11/16 12:03:30 WARN log: Updated size to 436185
24/11/16 12:03:30 WARN HiveConf: HiveConf of name
hive.internal.ss.authz.settings.applied.marker does not exist
24/11/16 12:03:30 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout
does not exist
24/11/16 12:03:30 WARN HiveConf: HiveConf of name hive.stats.retries.wait
does not exist
Time taken: 4.843 seconds
spark-sql (default)>
> SET hoodie.merge.small.file.group.candidates.limit = 0;
hoodie.merge.small.file.group.candidates.limit 0
Time taken: 0.018 seconds, Fetched 1 row(s)
spark-sql (default)>
> UPDATE testing_positions.table2 SET fare = 20.0 WHERE
rider = 'rider-A';
24/11/16 12:03:31 WARN SparkStringUtils: Truncated the string representation
of a plan since it was too large. This behavior can be adjusted by setting
'spark.sql.debug.maxToStringFields'.
24/11/16 12:03:32 WARN HoodieFileIndex: Data skipping requires both Metadata
Table and at least one of Column Stats Index, Record Level Index, or Functional
Index to be enabled as well! (isMetadataTableEnabled = false,
isColumnStatsIndexEnabled = false, isRecordIndexApplicable = false,
isFunctionalIndexEnabled = false, isBucketIndexEnable = false,
isPartitionStatsIndexEnabled = false), isBloomFiltersIndexEnabled = false)
24/11/16 12:03:32 WARN HoodieDataBlock: There are records without valid
positions. Skip writing record positions to the data block header.
24/11/16 12:03:34 WARN HiveConf: HiveConf of name
hive.internal.ss.authz.settings.applied.marker does not exist
24/11/16 12:03:34 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout
does not exist
24/11/16 12:03:34 WARN HiveConf: HiveConf of name hive.stats.retries.wait
does not exist
Time taken: 5.545 seconds
spark-sql (default)>
> DELETE FROM testing_positions.table2 WHERE uuid =
'e3cf430c-889d-4015-bc98-59bdce1e530c';
24/11/16 12:03:37 WARN HoodieFileIndex: Data skipping requires both Metadata
Table and at least one of Column Stats Index, Record Level Index, or Functional
Index to be enabled as well! (isMetadataTableEnabled = false,
isColumnStatsIndexEnabled = false, isRecordIndexApplicable = false,
isFunctionalIndexEnabled = false, isBucketIndexEnable = false,
isPartitionStatsIndexEnabled = false), isBloomFiltersIndexEnabled = false)
24/11/16 12:03:37 WARN HoodiePositionBasedFileGroupRecordBuffer: No record
position info is found when attempt to do position based merge.
24/11/16 12:03:37 WARN HoodiePositionBasedFileGroupRecordBuffer: Falling
back to key based merge for Read
24/11/16 12:03:38 WARN HoodieDeleteBlock: There are delete records without
valid positions. Skip writing record positions to the delete block header.
24/11/16 12:03:39 WARN HiveConf: HiveConf of name
hive.internal.ss.authz.settings.applied.marker does not exist
24/11/16 12:03:39 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout
does not exist
24/11/16 12:03:39 WARN HiveConf: HiveConf of name hive.stats.retries.wait
does not exist
Time taken: 2.992 seconds
spark-sql (default)>
> select * from testing_positions.table2;
24/11/16 12:03:41 WARN HoodiePositionBasedFileGroupRecordBuffer: No record
position info is found when attempt to do position based merge.
24/11/16 12:03:41 WARN HoodiePositionBasedFileGroupRecordBuffer: No record
position info is found when attempt to do position based merge.
24/11/16 12:03:41 WARN HoodiePositionBasedFileGroupRecordBuffer: Falling
back to key based merge for Read
24/11/16 12:03:41 WARN HoodiePositionBasedFileGroupRecordBuffer: Falling
back to key based merge for Read
20241116120326527 20241116120326527_0_0
1dced545-862b-4ceb-8b43-d2a568f6616b city=san_francisco
1ba64ef0-bba2-469e-8ef5-696f8cdbe141-0_0-186-338_20241116120326527.parquet
16953320662041dced545-862b-4ceb-8b43-d2a568f6616b rider-E driver-O
93.5 san_francisco
20241116120326527 20241116120326527_0_1
e96c4396-3fad-413a-a942-4cb36106d721 city=san_francisco
1ba64ef0-bba2-469e-8ef5-696f8cdbe141-0_0-186-338_20241116120326527.parquet
1695091554788e96c4396-3fad-413a-a942-4cb36106d721 rider-C driver-M
27.7 san_francisco
20241116120326527 20241116120326527_0_2
9909a8b1-2d15-4d3d-8ec9-efc48c536a00 city=san_francisco
1ba64ef0-bba2-469e-8ef5-696f8cdbe141-0_0-186-338_20241116120326527.parquet
16950464621799909a8b1-2d15-4d3d-8ec9-efc48c536a00 rider-D driver-L
33.9 san_francisco
20241116120331896 20241116120331896_0_9
334e26e9-8355-45cc-97c6-c31daf0df330 city=san_francisco
1ba64ef0-bba2-469e-8ef5-696f8cdbe141-0 1695159649087
334e26e9-8355-45cc-97c6-c31daf0df330 rider-A driver-K 20.0
san_francisco
20241116120326527 20241116120326527_1_1
7a84095f-737f-40bc-b62f-6b69664712d2 city=sao_paulo
ba555452-0c3c-47dc-acc0-f90823e12408-0_1-186-339_20241116120326527.parquet
1695376420876 7a84095f-737f-40bc-b62f-6b69664712d2 rider-G driver-Q
43.4 sao_paulo
20241116120326527 20241116120326527_2_0
3eeb61f7-c2b0-4636-99bd-5d7a5a1d2c04 city=chennai
8dacb2f9-6901-4ab3-8139-697b51125f16-0_2-186-340_20241116120326527.parquet
1695173887231 3eeb61f7-c2b0-4636-99bd-5d7a5a1d2c04 rider-I driver-S
41.06 chennai
20241116120326527 20241116120326527_2_1
c8abbe79-8d89-47ea-b4ce-4d224bae5bfa city=chennai
8dacb2f9-6901-4ab3-8139-697b51125f16-0_2-186-340_20241116120326527.parquet
1695115999911 c8abbe79-8d89-47ea-b4ce-4d224bae5bfa rider-J driver-T
17.85 chennai
Time taken: 1.719 seconds, Fetched 7 row(s) {code}
## JIRA info
- Link: https://issues.apache.org/jira/browse/HUDI-8553
- Type: Sub-task
- Parent: https://issues.apache.org/jira/browse/HUDI-9107
- Fix version(s):
- 1.1.0
---
## Comments
21/Nov/24 21:18;jonvex;I have verified this with the script:
{code:java}
SET hoodie.spark.sql.optimized.writes.enable = false;
CREATE TABLE table2 ( ts BIGINT, uuid STRING, rider STRING,
driver STRING, fare DOUBLE, city STRING ) USING HUDI LOCATION
'file:///tmp/testpositions' TBLPROPERTIES ( type = 'mor', primaryKey =
'uuid', preCombineField = 'ts' ) PARTITIONED BY (city);
INSERT INTO table2 VALUES
(1695159649087,'334e26e9-8355-45cc-97c6-c31daf0df330','rider-A','driver-K',19.10,'san_francisco'),
(1695091554788,'e96c4396-3fad-413a-a942-4cb36106d721','rider-C','driver-M',27.70
,'san_francisco'),
(1695046462179,'9909a8b1-2d15-4d3d-8ec9-efc48c536a00','rider-D','driver-L',33.90
,'san_francisco'),
(1695332066204,'1dced545-862b-4ceb-8b43-d2a568f6616b','rider-E','driver-O',93.50,'san_francisco'),
(1695516137016,'e3cf430c-889d-4015-bc98-59bdce1e530c','rider-F','driver-P',34.15,'sao_paulo'
),
(1695376420876,'7a84095f-737f-40bc-b62f-6b69664712d2','rider-G','driver-Q',43.40
,'sao_paulo' ),
(1695173887231,'3eeb61f7-c2b0-4636-99bd-5d7a5a1d2c04','rider-I','driver-S',41.06
,'chennai' ),
(1695115999911,'c8abbe79-8d89-47ea-b4ce-4d224bae5bfa','rider-J','driver-T',17.85,'chennai');
SET hoodie.merge.small.file.group.candidates.limit = 0;
UPDATE table2 SET fare = 20.0 WHERE rider = 'rider-A';
DELETE FROM table2 WHERE uuid = 'e3cf430c-889d-4015-bc98-59bdce1e530c';
select * from table2; {code}
I tested with optimized writes enabled and disabled. When optimized writes
are disabled, there is no warning about position fallback
Here is with optimized writes false:
{code:java}
spark-sql (default)> SET hoodie.spark.sql.optimized.writes.enable = false;
24/11/21 16:11:45 WARN DFSPropertiesConfiguration: Properties file
file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file
24/11/21 16:11:45 WARN DFSPropertiesConfiguration: Cannot find
HUDI_CONF_DIR, please set it as the dir of hudi-defaults.conf
hoodie.spark.sql.optimized.writes.enable false
Time taken: 0.764 seconds, Fetched 1 row(s)
spark-sql (default)> CREATE TABLE table2 (
> ts BIGINT,
> uuid STRING,
> rider STRING,
> driver STRING,
> fare DOUBLE,
> city STRING
> ) USING HUDI
> LOCATION 'file:///tmp/testpositions'
> TBLPROPERTIES (
> type = 'mor',
> primaryKey = 'uuid',
> preCombineField = 'ts'
> )
> PARTITIONED BY (city);
24/11/21 16:11:52 WARN TableSchemaResolver: Could not find any data file
written for commit, so could not get schema for table file:/tmp/testpositions
Time taken: 0.384 seconds
spark-sql (default)> INSERT INTO table2
> VALUES
>
(1695159649087,'334e26e9-8355-45cc-97c6-c31daf0df330','rider-A','driver-K',19.10,'san_francisco'),
>
(1695091554788,'e96c4396-3fad-413a-a942-4cb36106d721','rider-C','driver-M',27.70
,'san_francisco'),
>
(1695046462179,'9909a8b1-2d15-4d3d-8ec9-efc48c536a00','rider-D','driver-L',33.90
,'san_francisco'),
>
(1695332066204,'1dced545-862b-4ceb-8b43-d2a568f6616b','rider-E','driver-O',93.50,'san_francisco'),
>
(1695516137016,'e3cf430c-889d-4015-bc98-59bdce1e530c','rider-F','driver-P',34.15,'sao_paulo'
),
>
(1695376420876,'7a84095f-737f-40bc-b62f-6b69664712d2','rider-G','driver-Q',43.40
,'sao_paulo' ),
>
(1695173887231,'3eeb61f7-c2b0-4636-99bd-5d7a5a1d2c04','rider-I','driver-S',41.06
,'chennai' ),
>
(1695115999911,'c8abbe79-8d89-47ea-b4ce-4d224bae5bfa','rider-J','driver-T',17.85,'chennai');
24/11/21 16:12:02 WARN TableSchemaResolver: Could not find any data file
written for commit, so could not get schema for table file:/tmp/testpositions
24/11/21 16:12:03 WARN TableSchemaResolver: Could not find any data file
written for commit, so could not get schema for table file:/tmp/testpositions
24/11/21 16:12:05 WARN MetricsConfig: Cannot locate configuration: tried
hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
24/11/21 16:12:05 WARN HoodieBackedTableMetadataWriter: Skipping secondary
index initialization as only one secondary index bootstrap at a time is
supported for now. Provided: []
# WARNING: Unable to attach Serviceability Agent. Unable to attach even with
module exceptions: [org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException:
Sense failed., org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense
failed., org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense
failed.]
24/11/21 16:12:08 WARN HoodieBackedTableMetadataWriter: Skipping secondary
index initialization as only one secondary index bootstrap at a time is
supported for now. Provided: []
Time taken: 5.728 seconds
spark-sql (default)> SET hoodie.merge.small.file.group.candidates.limit = 0;
hoodie.merge.small.file.group.candidates.limit 0
Time taken: 0.012 seconds, Fetched 1 row(s)
spark-sql (default)> UPDATE table2 SET fare = 20.0 WHERE rider = 'rider-A';
24/11/21 16:12:16 WARN SparkStringUtils: Truncated the string representation
of a plan since it was too large. This behavior can be adjusted by setting
'spark.sql.debug.maxToStringFields'.
24/11/21 16:12:16 WARN HoodieFileIndex: Data skipping requires both Metadata
Table and at least one of Column Stats Index, Record Level Index, or Functional
Index to be enabled as well! (isMetadataTableEnabled = false,
isColumnStatsIndexEnabled = false, isRecordIndexApplicable = false,
isFunctionalIndexEnabled = false, isBucketIndexEnable = false,
isPartitionStatsIndexEnabled = false), isBloomFiltersIndexEnabled = false)
24/11/21 16:12:16 WARN HoodieBackedTableMetadataWriter: Skipping secondary
index initialization as only one secondary index bootstrap at a time is
supported for now. Provided: []
24/11/21 16:12:17 WARN HoodieBackedTableMetadataWriter: Skipping secondary
index initialization as only one secondary index bootstrap at a time is
supported for now. Provided: []
Time taken: 1.802 seconds
spark-sql (default)> DELETE FROM table2 WHERE uuid =
'e3cf430c-889d-4015-bc98-59bdce1e530c';
24/11/21 16:12:27 WARN HoodieFileIndex: Data skipping requires both Metadata
Table and at least one of Column Stats Index, Record Level Index, or Functional
Index to be enabled as well! (isMetadataTableEnabled = false,
isColumnStatsIndexEnabled = false, isRecordIndexApplicable = false,
isFunctionalIndexEnabled = false, isBucketIndexEnable = false,
isPartitionStatsIndexEnabled = false), isBloomFiltersIndexEnabled = false)
24/11/21 16:12:27 WARN HoodieBackedTableMetadataWriter: Skipping secondary
index initialization as only one secondary index bootstrap at a time is
supported for now. Provided: []
24/11/21 16:12:27 WARN HoodieBackedTableMetadataWriter: Skipping secondary
index initialization as only one secondary index bootstrap at a time is
supported for now. Provided: []
Time taken: 1.332 seconds
spark-sql (default)> select * from table2;
20241121161203621 20241121161203621_0_0
1dced545-862b-4ceb-8b43-d2a568f6616b city=san_francisco
1ad629cc-6f75-4ac3-bff2-e4f842421f51-0_0-21-67_20241121161203621.parquet
1695332066204 1dced545-862b-4ceb-8b43-d2a568f6616b rider-E driver-O
93.5 san_francisco
20241121161203621 20241121161203621_0_1
e96c4396-3fad-413a-a942-4cb36106d721 city=san_francisco
1ad629cc-6f75-4ac3-bff2-e4f842421f51-0_0-21-67_20241121161203621.parquet
1695091554788 e96c4396-3fad-413a-a942-4cb36106d721 rider-C driver-M
27.7 san_francisco
20241121161203621 20241121161203621_0_2
9909a8b1-2d15-4d3d-8ec9-efc48c536a00 city=san_francisco
1ad629cc-6f75-4ac3-bff2-e4f842421f51-0_0-21-67_20241121161203621.parquet
1695046462179 9909a8b1-2d15-4d3d-8ec9-efc48c536a00 rider-D driver-L
33.9 san_francisco
20241121161216516 20241121161216516_0_1
334e26e9-8355-45cc-97c6-c31daf0df330 city=san_francisco
1ad629cc-6f75-4ac3-bff2-e4f842421f51-0 1695159649087
334e26e9-8355-45cc-97c6-c31daf0df330 rider-A driver-K 20.0
san_francisco
20241121161203621 20241121161203621_1_1
7a84095f-737f-40bc-b62f-6b69664712d2 city=sao_paulo
c06df00f-d40d-42b1-b320-52de6bd05d0e-0_1-21-68_20241121161203621.parquet
1695376420876 7a84095f-737f-40bc-b62f-6b69664712d2 rider-G driver-Q
43.4 sao_paulo
20241121161203621 20241121161203621_2_0
3eeb61f7-c2b0-4636-99bd-5d7a5a1d2c04 city=chennai
41db64e9-04c0-4fcb-8378-ce50e0dc7c22-0_2-21-69_20241121161203621.parquet
1695173887231 3eeb61f7-c2b0-4636-99bd-5d7a5a1d2c04 rider-I driver-S
41.06 chennai
20241121161203621 20241121161203621_2_1
c8abbe79-8d89-47ea-b4ce-4d224bae5bfa city=chennai
41db64e9-04c0-4fcb-8378-ce50e0dc7c22-0_2-21-69_20241121161203621.parquet
1695115999911 c8abbe79-8d89-47ea-b4ce-4d224bae5bfa rider-J driver-T
17.85 chennai
Time taken: 0.219 seconds, Fetched 7 row(s) {code}
And here it is without setting optimized writes to false, which has a
default of true:
{code:java}
spark-sql (default)> CREATE TABLE table2 ( > ts
BIGINT, > uuid STRING, > rider
STRING, > driver STRING, > fare
DOUBLE, > city STRING > ) USING HUDI
> LOCATION 'file:///tmp/testpositions' >
TBLPROPERTIES ( > type = 'mor', >
primaryKey = 'uuid', > preCombineField = 'ts'
> ) > PARTITIONED BY (city);24/11/21 16:14:20 WARN
DFSPropertiesConfiguration: Properties file
file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props
file24/11/21 16:14:20 WARN DFSPropertiesConfiguration: Cannot find
HUDI_CONF_DIR, please set it as the dir of hudi-defaults.conf24/11/21 16:14:20
WARN TableSchemaResolver: Could not find any data file written for commit, so
could not get schema for table file:/t
mp/testpositionsTime taken: 1.004 secondsspark-sql (default)> INSERT INTO
table2 > VALUES >
(1695159649087,'334e26e9-8355-45cc-97c6-c31daf0df330','rider-A','driver-K',19.10,'san_francisco'),
>
(1695091554788,'e96c4396-3fad-413a-a942-4cb36106d721','rider-C','driver-M',27.70
,'san_francisco'), >
(1695046462179,'9909a8b1-2d15-4d3d-8ec9-efc48c536a00','rider-D','driver-L',33.90
,'san_francisco'), >
(1695332066204,'1dced545-862b-4ceb-8b43-d2a568f6616b','rider-E','driver-O',93.50,'san_francisco'),
>
(1695516137016,'e3cf430c-889d-4015-bc98-59bdce1e530c','rider-F','driver-P',34.15,'sao_paulo'
), >
(1695376420876,'7a84095f-737f-40bc-b62f-6b69664712d2','rider-G','driver-Q',43.40
,'sao_paulo' ), >
(1695173887231,'3eeb61f7-c2b0-4636-99bd-5d7a5a1d2c04','rider-I','driver-S',41.06
,'chennai' ), > (1695115999911,'c8abbe7
9-8d89-47ea-b4ce-4d224bae5bfa','rider-J','driver-T',17.85,'chennai');24/11/21
16:14:28 WARN TableSchemaResolver: Could not find any data file written for
commit, so could not get schema for table file:/tmp/testpositions24/11/21
16:14:28 WARN TableSchemaResolver: Could not find any data file written for
commit, so could not get schema for table file:/tmp/testpositions24/11/21
16:14:30 WARN MetricsConfig: Cannot locate configuration: tried
hadoop-metrics2-hbase.properties,hadoop-metrics2.properties24/11/21 16:14:31
WARN HoodieBackedTableMetadataWriter: Skipping secondary index initialization
as only one secondary index bootstrap at a time is supported for now. Provided:
[]# WARNING: Unable to attach Serviceability Agent. Unable to attach even with
module exceptions: [org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException:
Sense failed., org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense
failed., org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense
failed.]24/
11/21 16:14:33 WARN HoodieBackedTableMetadataWriter: Skipping secondary index
initialization as only one secondary index bootstrap at a time is supported for
now. Provided: []Time taken: 5.734 secondsspark-sql (default)> SET
hoodie.merge.small.file.group.candidates.limit =
0;hoodie.merge.small.file.group.candidates.limit 0Time taken: 0.016 seconds,
Fetched 1 row(s)spark-sql (default)> UPDATE table2 SET fare = 20.0 WHERE rider
= 'rider-A';24/11/21 16:14:41 WARN SparkStringUtils: Truncated the string
representation of a plan since it was too large. This behavior can be adjusted
by setting 'spark.sql.debug.maxToStringFields'.24/11/21 16:14:41 WARN
HoodieFileIndex: Data skipping requires both Metadata Table and at least one of
Column Stats Index, Record Level Index, or Functional Index to be enabled as
well! (isMetadataTableEnabled = false, isColumnStatsIndexEnabled = false,
isRecordIndexApplicable = false, isFunctionalIndexEnabled = false,
isBucketIndexEnable = false, isPartitionSta
tsIndexEnabled = false), isBloomFiltersIndexEnabled = false)24/11/21 16:14:41
WARN HoodieBackedTableMetadataWriter: Skipping secondary index initialization
as only one secondary index bootstrap at a time is supported for now. Provided:
[]24/11/21 16:14:42 WARN HoodieDataBlock: There are records without valid
positions. Skip writing record positions to the data block header.24/11/21
16:14:42 WARN HoodieBackedTableMetadataWriter: Skipping secondary index
initialization as only one secondary index bootstrap at a time is supported for
now. Provided: []Time taken: 1.59 secondsspark-sql (default)> DELETE FROM
table2 WHERE uuid = 'e3cf430c-889d-4015-bc98-59bdce1e530c';24/11/21 16:14:47
WARN HoodieFileIndex: Data skipping requires both Metadata Table and at least
one of Column Stats Index, Record Level Index, or Functional Index to be
enabled as well! (isMetadataTableEnabled = false, isColumnStatsIndexEnabled =
false, isRecordIndexApplicable = false, isFunctionalIndexEnabled = false, isBuc
ketIndexEnable = false, isPartitionStatsIndexEnabled = false),
isBloomFiltersIndexEnabled = false)24/11/21 16:14:47 WARN
HoodieBackedTableMetadataWriter: Skipping secondary index initialization as
only one secondary index bootstrap at a time is supported for now. Provided:
[]24/11/21 16:14:47 WARN HoodiePositionBasedFileGroupRecordBuffer: No record
position info is found when attempt to do position based merge.24/11/21
16:14:47 WARN HoodiePositionBasedFileGroupRecordBuffer: Falling back to key
based merge for Read24/11/21 16:14:47 WARN HoodieDeleteBlock: There are delete
records without valid positions. Skip writing record positions to the delete
block header.24/11/21 16:14:47 WARN HoodieBackedTableMetadataWriter: Skipping
secondary index initialization as only one secondary index bootstrap at a time
is supported for now. Provided: []Time taken: 1.103 secondsspark-sql (default)>
select * from table2;24/11/21 16:14:53 WARN
HoodiePositionBasedFileGroupRecordBuffer: No record position
info is found when attempt to do position based merge.24/11/21 16:14:53 WARN
HoodiePositionBasedFileGroupRecordBuffer: No record position info is found when
attempt to do position based merge.24/11/21 16:14:53 WARN
HoodiePositionBasedFileGroupRecordBuffer: Falling back to key based merge for
Read24/11/21 16:14:53 WARN HoodiePositionBasedFileGroupRecordBuffer: Falling
back to key based merge for Read20241121161428912 20241121161428912_0_0
1dced545-862b-4ceb-8b43-d2a568f6616b city=san_francisco
cf8f187a-f827-454d-a26f-114e30c519ed-0_0-21-67_20241121161428912.parquet
1695332066204 1dced545-862b-4ceb-8b43-d2a568f6616b rider-E driver-O
93.5 san_francisco20241121161428912 20241121161428912_0_1
e96c4396-3fad-413a-a942-4cb36106d721 city=san_francisco
cf8f187a-f827-454d-a26f-114e30c519ed-0_0-21-67_20241121161428912.parquet
1695091554788 e96c4396-3fad-413a-a942-4cb36106d721 rider-C driver-M
27.7 san_francisco20241121161428912 20241121161428912_0_
2 9909a8b1-2d15-4d3d-8ec9-efc48c536a00 city=san_francisco
cf8f187a-f827-454d-a26f-114e30c519ed-0_0-21-67_20241121161428912.parquet
1695046462179 9909a8b1-2d15-4d3d-8ec9-efc48c536a00 rider-D driver-L
33.9 san_francisco20241121161441739 20241121161441739_0_1
334e26e9-8355-45cc-97c6-c31daf0df330 city=san_francisco
cf8f187a-f827-454d-a26f-114e30c519ed-0 1695159649087
334e26e9-8355-45cc-97c6-c31daf0df330 rider-A driver-K 20.0
san_francisco20241121161428912 20241121161428912_1_1
7a84095f-737f-40bc-b62f-6b69664712d2 city=sao_paulo
22b6070f-6c72-4a3d-9fc6-8bac16a7e873-0_1-21-68_20241121161428912.parquet
1695376420876 7a84095f-737f-40bc-b62f-6b69664712d2 rider-G driver-Q
43.4 sao_paulo20241121161428912 20241121161428912_2_0
3eeb61f7-c2b0-4636-99bd-5d7a5a1d2c04 city=chennai
878ae75b-bb04-4ed8-8591-8fafc56ed7ba-0_2-21-69_20241121161428912.parquet
1695173887231 3eeb61f7-c2b0-4636-99bd-5d7a5a1d2c04 rider-I driver-S
41.06 chennai20241121161428912 20241121161428912_2_1
c8abbe79-8d89-47ea-b4ce-4d224bae5bfa city=chennai
878ae75b-bb04-4ed8-8591-8fafc56ed7ba-0_2-21-69_20241121161428912.parquet
1695115999911 c8abbe79-8d89-47ea-b4ce-4d224bae5bfa rider-J driver-T
17.85 chennaiTime taken: 0.185 seconds, Fetched 7 row(s) {code};;;
---
21/Nov/24 22:07;jonvex;To unblock the release we can either do one of two
things:
# Disable `hoodie.spark.sql.optimized.writes.enable`
## This will decrease performance of writes during keygen and index lookup
## Positions will be included in the updates
## we can check to see if positions are even enabled and only default it
when position writing is disabled
# Keep the code as is
## This will decrease performance during read of uncompacted filegroups
## We have the ability to fallback when positions are missing and I have
written extensive test cases to ensure that fallback works correctly in all
combinations of log and base files
## This can also use some extra disk space during the read because we have
to rewrite mappings in the spillable map, and deletes to the spillable map
don't actually free up space until we close it
To actually write positions using the prepped workflow, I think there is a
way to do this but will not be that easy:
# We will need to read _tmp_metadata_row_index inside the update and
delete sql commands. Then during keygen, we will get the position from that
field
## this will take some work because I don't think we have ever tried to
read positions at the dataset level
# Then we will need to get the positions out during key generation
## this should be easy
# Then we will need to drop the column before we do the write
## this will probably be pretty easy
;;;
---
26/Nov/24 00:30;yihua;Deferring this to Hudi 1.1 since this does not cause
correctness issue and adding positional updates and deletes in SQL UPDATE and
DELETE needs design.;;;
---
10/Jan/25 00:18;yihua;In the UPDATE and DELETE command, we'll try creating
the relation with a schema that has the row index meta column or a new hoodie
meta column to attach the row index column to the return DF (this also requires
the file group reader and parquet reader to keep the new row index column by
fixing the wiring). In that way, we can pass the positions down to the prepped
write flow and prepare the HoodieRecords with the current record location.;;;
---
10/Jan/25 01:59;yihua;I have a draft PR up which makes the prepped upsert
flow write record positions to the log blocks from Spark SQL UPDATE statement.
I'm going to fix a few issues before opening it up for review.;;;
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]