Fokko commented on pull request #1972: URL: https://github.com/apache/iceberg/pull/1972#issuecomment-753598331
@rdblue Happy new year! :) Looked into this, it was much more interesting (and time consuming :-) than I've anticipated. You were close. Allow me to summarize: Also good to know, is that Beam does not output the crc files for the Avro data files, and therefore it silently ignores them when it comes to the names without the semicolons:  However, with the semicolons, it fails at line 145. So it is unable to open the file. It indeed takes the part before as the scheme:  At `Path(Path parent, String child)` in `Path.java` the parent is `/tmp/test_streaming_crc`, and the child `.output-1970-01-01T00:00:00.000Z-1970-01-01T00:01:00.000Z-00000-of-00001.avro.crc`. Which is being split by the Path constructor. It detects the first semicolon, and will proceed if there is no slash. So this will make `.output-1970-01-01T00` the schema, and `00:00.000Z-1970-01-01T00:01:00.000Z-00000-of-00001.avro.crc` the file:  However, Java doesn't not accept the semicolons. It boils down to this: ```java import java.net.URI; // This works: new URI(null, null, normalizePath(null, ".00004-e82668ab-25ec-4190-864e-e4d140eb83b5.metadata.json.crc"), null, null); // This breaks: new URI(null, null, normalizePath(null, ".output-1970-01-01T00:00:00.000Z-1970-01-01T00:01:00.000Z-00000-of-00001.avro.crc"), null, null); ``` Throwing the Relative path in absolute URI error. This is still an open issue: - https://issues.apache.org/jira/browse/HADOOP-3257 - https://issues.apache.org/jira/browse/HADOOP-7945 - https://issues.apache.org/jira/browse/HADOOP-12455 - https://issues.apache.org/jira/browse/HADOOP-14217 I'll add a note to the new naming policy, referring to this issue, and leave this as is for now. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
