mneedham opened a new pull request #7776:
URL: https://github.com/apache/pinot/pull/7776
I wanted to import a CSV file that contains a DateTime field.
The CSV file looks like this:
```
ID,Date
10224738,09-05-2015T09:58:00
```
And then the schema file:
```
{
"schemaName": "crimes",
"dimensionFieldSpecs": [
{
"name": "ID",
"dataType": "INT"
}
],
"dateTimeFieldSpecs": [{
"name": "Date",
"dataType": "STRING",
"format" : "1:SECONDS:SIMPLE_DATE_FORMAT:MM-dd-yyyy'T'HH:mm:ss",
"granularity": "1:HOURS"
}]
}
```
But we get this error when running the ingestion job:
```
2021/11/16 11:37:50.382 ERROR [SegmentGenerationJobRunner] [pool-2-thread-1]
Failed to generate Pinot segment for file - file:/data/mark.csv
java.lang.IllegalArgumentException: null
at
shaded.com.google.common.base.Preconditions.checkArgument(Preconditions.java:108)
~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-540e70e9e3e24bdb2a14f56b2c1264180abaeda8]
at
org.apache.pinot.segment.spi.creator.name.SimpleSegmentNameGenerator.generateSegmentName(SimpleSegmentNameGenerator.java:53)
~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-540e70e9e3e24bdb2a14f56b2c1264180abaeda8]
at
org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.handlePostCreation(SegmentIndexCreationDriverImpl.java:268)
~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-540e70e9e3e24bdb2a14f56b2c1264180abaeda8]
at
org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:258)
~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-540e70e9e3e24bdb2a14f56b2c1264180abaeda8]
at
org.apache.pinot.plugin.ingestion.batch.common.SegmentGenerationTaskRunner.run(SegmentGenerationTaskRunner.java:119)
~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-540e70e9e3e24bdb2a14f56b2c1264180abaeda8]
at
org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$submitSegmentGenTask$1(SegmentGenerationJobRunner.java:263)
~[pinot-batch-ingestion-standalone-0.9.0-SNAPSHOT-shaded.jar:0.9.0-SNAPSHOT-540e70e9e3e24bdb2a14f56b2c1264180abaeda8]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[?:?]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[?:?]
at java.lang.Thread.run(Thread.java:829) [?:?]
```
And the issue is that the min and max times don't pass the
`isValidSegmentName` function that was added to `SimpleSegmentNameGenerator`
in https://github.com/apache/pinot/pull/7085. The min and max values are both
`09-05-2015T09:58:00` and the issue is that they have the : in their name, but
we would have the same issue with other characters that may appear in date
fields, such as a space or forward slash.
This PR replaces those problematic characters inside
`SimpleSegmentNameGenerator` before the `isValidSegmentName` check.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]