lwz9103 opened a new issue, #8836:
URL: https://github.com/apache/incubator-gluten/issues/8836
### Backend
CH (ClickHouse)
### Bug description
Test Code
```
test("test partitioned with escaped characters") {
val schema = StructType(
Seq(
StructField.apply("id", IntegerType, nullable = true),
StructField.apply("escape", StringType, nullable = true)
))
val data: Seq[Row] = Seq(
Row(1, "="),
Row(2, "/"),
Row(3, "#"),
Row(4, ":"),
Row(5, "\\"),
Row(6, "\u0001"),
Row(7, " ")
)
val df = spark.createDataFrame(spark.sparkContext.parallelize(data),
schema)
df.createOrReplaceTempView("origin_table")
spark.sql("select * from origin_table").show()
spark.sql(s"""
|DROP TABLE IF EXISTS partition_escape;
|""".stripMargin)
spark.sql(s"""
|CREATE TABLE IF NOT EXISTS partition_escape
|(
| c1 int,
| c2 string
|)
|USING clickhouse
|PARTITIONED BY (c2)
|TBLPROPERTIES (storage_policy='__hdfs_main',
| orderByKey='c1',
| primaryKey='c1')
|LOCATION '$HDFS_URL/test/partition_escape'
|""".stripMargin)
spark.sql("insert into partition_escape select * from origin_table")
spark.sql("select * from partition_escape").show()
}
```
Error Msg:
```
2025-02-26 17:30:50.161
[1][ScalaTest-run-running-GlutenClickHouseMergeTreeWriteOnHDFSSuite] WARN
org.apache.spark.sql.catalyst.util.package: Truncated the string representation
of a plan since it was too large. This behavior can be adjusted by setting
'spark.sql.debug.maxToStringFields'.
2025-02-26 17:31:32.303 [68][Executor task launch worker for task 0.0 in
stage 18.0 (TID 11)] ERROR org.apache.spark.task.TaskResources: Task 11 failed
by error:
org.apache.gluten.exception.GlutenException: Unable to open HDFS file:
hdfs://127.0.0.1:8020/3-3/test/partition_escape/c2=%3A/5318fb88-759d-4e59-99de-395ceca04ba6_0_001/metadata.gluten.
Error: File does not exist:
/3-3/test/partition_escape/c2=%3A/5318fb88-759d-4e59-99de-395ceca04ba6_0_001/metadata.gluten
at
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:86)
at
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:76)
at
org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:156)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1990)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:768)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:442)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1029)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:957)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2957)
```
### Spark version
None
### Spark configurations
_No response_
### System information
_No response_
### Relevant logs
```bash
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]