reallocf edited a comment on issue #5268:
URL:
https://github.com/apache/incubator-pinot/issues/5268#issuecomment-616272779
Hey @npawar - I'm interested in helping out on this one. First initiative,
so bare with me if my understanding is wrong about everything đ. A question on
the requirements:
The idea is that we should be able to filter out data during ingestion, so
if we have a `pinotSchema.json` like
```
{
"schemaName": "events",
"dimensionFieldSpecs": [
{
"name": "userId",
"dataType": "LONG",
âtransformFunctionâ: âGroovy({userID}, userID)â
},
{
"name": "fullName",
"dataType": "STRING",
âtransformFunctionâ: âGroovy({firstName+' '+lastName}, firstName,
lastName)â
},
{
"name": "bids",
"dataType": "INT",
"singleValueField": false
},
{
"name": "maxBid",
"dataType": "INT",
"transformFunction": "Groovy({bids.max{ it.toBigDecimal() }}, bids)"
}
],
"metricFieldSpecs": [
{
"name": "impressions",
"dataType": "LONG",
âtransformFunctionâ: âGroovy({eventType == 'IMPRESSION' ? 1: 0},
eventType)â
},
{
"name": "clicks",
"dataType": "LONG",
âtransformFunctionâ: âGroovy({eventType == CLICK ? 1: 0}, eventType)â
},
{
"name": "cost",
"dataType": "double"
},
{
"name": "daysSinceEpoch",
"dataType": "INT",
âtransformFunctionâ: âGroovy({timestamp/(1000*60*60*24)}, timestamp)â
}
],
"timeFieldSpec": {
"incomingGranularitySpec": {
"name": "hoursSinceEpoch",
"dataType": "LONG",
"timeFormat" : "EPOCH",
"timeType": "HOURS",
âtransformFunctionâ: âGroovy({timestamp/(1000*60*60)}, timestamp)â
}
}
}
```
we would want to add a new top-level element in that json with a
transformFunction like
```
{
"schemaName": "events",
"filter": "Groovy({cost > 42}, cost)",
"dimensionFieldSpecs": [
{
"name": "userId",
"dataType": "LONG",
âtransformFunctionâ: âGroovy({userID}, userID)â
},
...
],
...
}
```
Then we would apply a row-based filter based on that transformFunction either
a) On source columns - applying the rest of the transformations AFTER
filtering
or
b) On transformed columns - applying the filtering after the rest of the
transformations.
Is that all right? Am I on the right track?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]