#general
@amommendes: @amommendes has joined the channel
#random
@amommendes: @amommendes has joined the channel
#troubleshooting
@mohammedgalalen056: Hi, I faced this error when trying to do BatchIngestion from the local file system `Failed to generate Pinot segment for file - file:data/orders.csv` `java.lang.NumberFormatException: For input string: "2019-05-02 17:49:53"` here is the dateTimeFieldSpecs in the schema file: ```"dateTimeFieldSpecs": [ { "dataType": "STRING", "name": "start_date", "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss", "granularity": "1:DAYS" }, { "dataType": "STRING", "name": "end_date", "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss", "granularity": "1:DAYS" }, { "dataType": "STRING", "name": "created_at", "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss", "granularity": "1:DAYS" }, { "dataType": "STRING", "name": "updated_at", "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss", "granularity": "1:DAYS" } ]```
@ken: What’s the full schema? Looks like you’ve got a numeric (metrics or dimensions) field, but the data in your input file is a date.
@mohammedgalalen056: ```{ "schemaName": "orders", "metricFieldSpecs": [ { "dataType": "DOUBLE", "name": "total" }, { "dataType": "FLOAT", "name": "percentage" } ], "dimensionFieldSpecs": [ { "dataType": "INT", "name": "id" }, { "dataType": "STRING", "name": "user_id" }, { "dataType": "STRING", "name": "worker_id" }, { "dataType": "INT", "name": "job_id" }, { "dataType": "DOUBLE", "name": "lat" }, { "dataType": "DOUBLE", "name": "lng" }, { "dataType": "INT", "name": "work_place" }, { "dataType": "STRING", "name": "note" }, { "dataType": "STRING", "name": "address" }, { "dataType": "STRING", "name": "canceled_by" }, { "dataType": "INT", "name": "status" }, { "dataType": "STRING", "name": "canceled_message" } ], "dateTimeFieldSpecs": [ { "dataType": "STRING", "name": "start_date", "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss", "granularity": "1:DAYS" }, { "dataType": "STRING", "name": "end_date", "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss", "granularity": "1:DAYS" }, { "dataType": "STRING", "name": "created_at", "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss", "granularity": "1:DAYS" }, { "dataType": "STRING", "name": "updated_at", "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss", "granularity": "1:DAYS" } ] }```
@ken: I’d take a few rows of your input data and dump into Excel, to confirm the order/number of columns matches what you’ve defined in your schema.
@mohammedgalalen056: I've fixed the error, the raw data was corrupted
@fabricio.dutra87: Hi all, I'm trying to ingest data from kafka using a topic that doesnt has a datetime column and receving this error: ```{"code":400,"error":"Schema should not be null for REALTIME table"}``` I'm using this spec: ```curl -X POST "
@g.kishore: did you upload the schema first?
@fabricio.dutra87: yes, but I had the same error message
@npawar: Can you paste the schema here?
@fabricio.dutra87: I'm not including a timefieldspec as I dont have it on my Kafka topic. So would be nice if there was a way to autofill a datetime column on Pinot. That's the spec: ```{ "schemaName": "sch_strimzi_ack", "dimensionFieldSpecs": [ { "name": "column1", "dataType": "STRING" } ] }```
@chinmay.cerebro: Auto creating a time stamp column is not supported as of now. Do you have any column in Kafka that we can derive time stamp from ?
@g.kishore: You can probably use now() udf
@fabricio.dutra87: hmm ok. We will try then to implement the workaround by including the datetime column on that topic. Thanks guys!!
@npawar: also, its failing in the first place because the schema name is not matching what you’ve put in the table config
@npawar: ```sch_strimzi_ack``` vs ```"schemaName\": \"sch_strimzi_acks\``` plural
@npawar: hence the schema not found exception
@npawar: we can make that exception clearer. Do you mind creating an issue on github?
@fabricio.dutra87: thanks Neha, the error was clearer when I fixed the name: ```{"code":400,"error":"'timeColumnName' cannot be null in REALTIME table config"}```
@falexvr: Hey guys, for some reason every query I sent to pinot is only returning 10 records at most, only if I specify a limit it brings more than 10 records, is there something I have to do to get the full amount of records?
@g.kishore: yes default limit is 10
@g.kishore: you can specify limit 1000 to get more records
@g.kishore: or 10000
@amommendes: @amommendes has joined the channel
#aggregators
@ita.pai: @ita.pai has joined the channel
@ita.pai: @ita.pai has left the channel
#pinot-dev
@ken: Currently `DistinctCountHLL` only works for single value fields. It seems like a simple change in `DistinctCountHLLAggregationFunction.aggregate()` to check if the `BlockValSet` is multi-valued, and if so then call `BlockValSet.getXXXMV()` and do a sub-iteration on the secondary array it returns. Does that make sense?
@g.kishore: Surprised that’s it’s not supported as of now
@ken: If you run this query on a MVF, you get: ``` "message": "QueryExecutionError:\njava.lang.UnsupportedOperationException\n\tat org.apache.pinot.core.segment.index.readers.ForwardIndexReader.readDictIds(ForwardIndexReader.java:84)\n\tat org.apache.pinot.core.common.DataFetcher$ColumnValueReader.readStringValues(DataFetcher.java:439)\n\tat org.apache.pinot.core.common.DataFetcher.fetchStringValues(DataFetcher.java:146)\n\tat org.apache.pinot.core.common.DataBlockCache.getStringValuesForSVColumn(DataBlockCache.java:194)\n\tat org.apache.pinot.core.operator.docvalsets.ProjectionBlockValSet.getStringValuesSV(ProjectionBlockValSet.java:94)\n\tat org.apache.pinot.core.query.aggregation.function.DistinctCountHLLAggregationFunction.aggregate(DistinctCountHLLAggregationFunction.java:103)\n\tat org.apache.pinot.core.query.aggregation.DefaultAggregationExecutor.aggregate(DefaultAggregationExecutor.java:47)\n\tat org.apache.pinot.core.operator.query.AggregationOperator.getNextBlock(AggregationOperator.java:66)\n\tat org.apache.pinot.core.operator.query.AggregationOperator.getNextBlock(AggregationOperator.java:35)\n\tat org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:49)\n\tat org.apache.pinot.core.operator.combine.BaseCombineOperator$1.runJob(BaseCombineOperator.java:94)\n\tat org.apache.pinot.core.util.trace.TraceRunnable.run(TraceRunnable.java:40)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)"```
@ken: I’ll file an issue and generate a PR
@mayanks: @ken can you try `distinctCountHLLMV`?
@mayanks: Aggregation functions on MV columns have an `MV` suffix in the name.
@ken: @mayanks Thanks for clarifying, I was confused by seeing `aggregate`, `aggregateGroupBySV`, and `aggregateGroupByMV`. Made me think there was a missing `aggregateMV` function. I see now that the `BySV` and ByMV` methods are for doing aggregations when the grouping column is SV vs. MV.
@mayanks: :+1:
@ken: @mayanks But why does there need to be a different function? In the implementations the function signatures are the same, and (I assume) the `BlockValSet` could be used to determine whether to handle it as an SV or an MV column.
@mayanks: Yeah, in future, we might merge the two.
@ken: OK, I’ll change my issue description :slightly_smiling_face:
@mayanks: sounds good
#community
@amommendes: @amommendes has joined the channel
#announcements
@amommendes: @amommendes has joined the channel
#getting-started
@phuchdh: @phuchdh has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
