#general


@vmagotra: @vmagotra has joined the channel
@marta: @marta has joined the channel
@masakal: @masakal has joined the channel
@thuynh: @thuynh has joined the channel
@dovydas: Hello! does pinot support importing gzipped data? We have gzipped JSON files in GCS bucket - can those be imported directly to pinot or do we have to serve uncompressed files in GCS?
  @mayanks: At present the segment generation takes uncompressed data. Would be a good enhancement to take gzipped. Could you please open an enhancement issue?
  @mayanks: It should be very straight forward to enhance
  @dovydas: thanks @mayanks I'll create an enhancement issue
  @mayanks: @dovydas Could you try this change:
  @ken: I currently import .gz files (OFFLINE) and it seems to work fine for me…is that what you were asking about?
@jlli: Hello community, We published a blog for Pinot 0.6.0 release. Please check out the blog and enjoy some excellent features there! Best Regards, Apache Pinot (incubating) Team
@nishant: @nishant has joined the channel

#random


@vmagotra: @vmagotra has joined the channel
@marta: @marta has joined the channel
@masakal: @masakal has joined the channel
@thuynh: @thuynh has joined the channel
@nishant: @nishant has joined the channel

#troubleshooting


@vmagotra: @vmagotra has joined the channel
@darshants.darshan1: @darshants.darshan1 has joined the channel
@marta: @marta has joined the channel
@joao.comini: Hello guys! I'm having some troubles while running a hybrid table, may someone help me please? I'm receiving these warnings in the Broker when pushing offline segments to Pinot: ```[BaseBrokerRequestHandler] [jersey-server-managed-async-executor-1] Failed to find time boundary info for hybrid table: transaction``` When I try to run a query, i get a timeout. Server log: ```Timed out while polling results block, numBlocksMerged: 0 (query: QueryContext{_tableName='transaction_REALTIME', _selectExpressions=[count(*)], _aliasMap={}, _filter=transactionDate > '1606971455132', _groupByExpressions=null, _havingFilter=null, _orderByExpressions=null, _limit=10, _offset=0, _queryOptions={responseFormat=sql, groupByMode=sql, timeoutMs=9999}, _debugOptions=null, _brokerRequest=BrokerRequest(querySource:QuerySource(tableName:transaction_REALTIME), filterQuery:FilterQuery(id:0, column:transactionDate, value:[(1606971455132 *)], operator:RANGE, nestedFilterQueryIds:[]), aggregationsInfo:[AggregationInfo(aggregationType:COUNT, aggregationParams:{column=*}, isInSelectList:true, expressions:[*])], filterSubQueryMap:FilterQueryMap(filterQueryMap:{0=FilterQuery(id:0, column:transactionDate, value:[(1606971455132 *)], operator:RANGE, nestedFilterQueryIds:[])}), queryOptions:{responseFormat=sql, groupByMode=sql, timeoutMs=9999}, pinotQuery:PinotQuery(dataSource:DataSource(tableName:transaction_REALTIME), selectList:[_expression_(type:FUNCTION, functionCall:Function(operator:COUNT, operands:[_expression_(type:IDENTIFIER, identifier:Identifier(name:*))]))], filterExpression:_expression_(type:FUNCTION, functionCall:Function(operator:GREATER_THAN, operands:[_expression_(type:IDENTIFIER, identifier:Identifier(name:transactionDate)), _expression_(type:LITERAL, literal:<Literal longValue:1606971455132>)]))), limit:10)})``` If I try to use `Tracing` i get a NPE in the offline servers: ```ERROR [QueryScheduler] [pqr-0] Encountered exception while processing requestId 83 from broker Broker_pinot-broker-0.pinot-broker-headless.pinot.svc.cluster.local_8099 java.lang.NullPointerException: null at org.apache.pinot.core.util.trace.TraceContext.getTraceInfo(TraceContext.java:188) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.core.query.executor.ServerQueryExecutorV1Impl.processQuery(ServerQueryExecutorV1Impl.java:235) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.core.query.executor.QueryExecutor.processQuery(QueryExecutor.java:60) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.core.query.scheduler.QueryScheduler.processQueryAndSerialize(QueryScheduler.java:155) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21]``` I'm running Pinot 0.6.0 btw,
  @joao.comini: If i delete the offline segment, everything works fine
  @mayanks: You may want to check if you have the time column setup correctly
  @joao.comini: Offline table config: ```{ "tableName": "transaction", "tableType": "OFFLINE", "tenants": { "broker": "fraud", "server": "fraud" }, "segmentsConfig": { "schemaName": "transaction", "timeColumnName": "transactionDate", "timeType": "MILLISECONDS", "replication": "2", "segmentPushType": "APPEND", "retentionTimeUnit": "DAYS", "retentionTimeValue": "365" }, "tableIndexConfig": { "loadMode": "MMAP", "invertedIndexColumns": [ "customerUuid" ], "noDictionaryColumns": ["totalValue"], "sortedColumn": [ "customerUuid" ], "segmentPartitionConfig": { "columnPartitionMap": { "customerUuid": { "functionName": "Murmur", "numPartitions": 4 } } } }, "metadata": {}, "routing": { "segmentPrunerTypes": ["partition"] } }``` Realtime table config: ```{ "tableName": "transaction", "tableType": "REALTIME", "tenants": { "broker": "fraud", "server": "fraud", "tagOverrideConfig": { "realtimeConsuming": "fraud_REALTIME", "realtimeCompleted": "fraud_OFFLINE" } }, "segmentsConfig": { "schemaName": "transaction", "timeColumnName": "transactionDate", "timeType": "MILLISECONDS", "replicasPerPartition": "2", "segmentPushType": "APPEND", "segmentPushFrequency": "DAILY", "retentionTimeUnit": "DAYS", "retentionTimeValue": "365", "completionConfig": { "completionMode": "DOWNLOAD" }, "peerSegmentDownloadScheme": "http" }, "tableIndexConfig": { "loadMode": "MMAP", "streamConfigs": { "streamType": "kafka", "stream.kafka.consumer.type": "LowLevel", "stream.kafka.consumer.prop.auto.offset.reset": "smallest", "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory", "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.avro.confluent.KafkaConfluentSchemaRegistryAvroMessageDecoder", "realtime.segment.flush.threshold.rows": "0", "realtime.segment.flush.threshold.time": "4h", "realtime.segment.flush.segment.size": "10M" }, "invertedIndexColumns": [ "customerUuid" ], "noDictionaryColumns": ["totalValue"], "sortedColumn": [ "customerUuid" ], "aggregateMetrics": true, "segmentPartitionConfig": { "columnPartitionMap": { "customerUuid": { "functionName": "Murmur", "numPartitions": 4 } } } }, "metadata": {}, "routing": { "segmentPrunerTypes": ["partition"] } }``` Schema: ```{ "schemaName": "transaction", "dimensionFieldSpecs": [ { "name": "customerUuid", "dataType": "STRING" } ], "metricFieldSpecs": [ { "name": "totalValue", "dataType": "DOUBLE" } ], "dateTimeFieldSpecs": [ { "name": "transactionDate", "dataType": "LONG", "format": "1:MILLISECONDS:EPOCH", "granularity": "1:MILLISECONDS" } ] }``` Ommited some things for security reasons :slightly_smiling_face:
  @joao.comini: I can't see anything wrong here. Maybe i'm missing something :disappointed:
  @mayanks: Do you see any error messages in the broker for setting time boundary? Like: `Failed to find segment with valid end time for table: {}, no time boundary generated`
  @joao.comini: Yes: `[TimeBoundaryManager] [ClusterChangeHandlingThread] Failed to find segment with valid end time for table: transaction_OFFLINE, no time boundary generated`
  @joao.comini: I have these logs at the controller: ```2020/12/04 16:43:50.103 WARN [ZkBaseDataAccessor] [HelixController-pipeline-default-pinot-(afa2c547_DEFAULT)] Fail to read record for paths: {/pinot/INSTANCES/Server_pinot-server-1.pinot-server-headless.pinot.svc.cluster.local_8098/CURRENTSTATES/102888f0a96000a/transaction_OFFLINE=-101} 2020/12/04 16:43:50.104 WARN [AbstractDataCache] [HelixController-pipeline-default-pinot-(afa2c547_DEFAULT)] znode is null for key: /pinot/INSTANCES/Server_pinot-server-1.pinot-server-headless.pinot.svc.cluster.local_8098/CURRENTSTATES/102888f0a96000a/transaction_OFFLINE 2020/12/04 16:55:24.471 WARN [TopStateHandoffReportStage] [HelixController-pipeline-default-pinot-(cfba642e_DEFAULT)] Event cfba642e_DEFAULT : Cannot confirm top state missing start time. Use the current system time as the start time.```
  @mayanks: can you check `select max(time)..`?
  @joao.comini: sometimes i get an exception while running this query and sometimes it returns a result: `1607100225939` ```ERROR [ServerQueryExecutorV1Impl] [pqr-1] Exception processing requestId 179 java.lang.RuntimeException: Caught exception while running CombinePlanNode. at org.apache.pinot.core.plan.CombinePlanNode.run(CombinePlanNode.java:151) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.core.plan.InstanceResponsePlanNode.run(InstanceResponsePlanNode.java:33) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.core.plan.GlobalPlanImplV0.execute(GlobalPlanImplV0.java:45) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21] at org.apache.pinot.core.query.executor.ServerQueryExecutorV1Impl.processQuery(ServerQueryExecutorV1Impl.java:294) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21]```
  @mayanks: What is the exact query you are running? Do you have both offline and realtime tables right now?
  @joao.comini: if i try `select max(transactionDate) from transaction_OFFLINE`, it just timeouts
  @joao.comini: > What is the exact query you are running? Do you have both offline and realtime tables right now? `select max(transactionDate) from transaction`
  @mayanks: Hmm, `select max(transactionDate) from transaction_OFFLINE` should be really fast as it only looks at metadata.
  @mayanks: as in `max` without predicate from _OFFLINE/_REALTIME table
  @joao.comini: Right, maybe some issues with the zookeeper?
  @mayanks: NO, it is segment metadata
  @mayanks: Check if the offline servers got the query
  @mayanks: The only thing that stands out is that you have MILLIS as time unit. Initially we did not support MILLIS for hybrid tables, but IIRC it was added a while back
  @joao.comini: the offline servers just timeouts: ```ERROR [BaseCombineOperator] [pqr-0] Timed out while polling results block, numBlocksMerged: 0 (query: QueryContext{_tableName='transaction_OFFLINE', _selectExpressions=[max(transactionDate)], _aliasMap={}, _filter=null, _groupByExpressions=null, _havingFilter=null, _orderByExpressions=null, _limit=10, _offset=0, _queryOptions={responseFormat=sql, groupByMode=sql, timeoutMs=9999}, _debugOptions=null, _brokerRequest=BrokerRequest(querySource:QuerySource(tableName:transaction_OFFLINE), aggregationsInfo:[AggregationInfo(aggregationType:MAX, aggregationParams:{column=transactionDate}, isInSelectList:true, expressions:[transactionDate])], queryOptions:{responseFormat=sql, groupByMode=sql, timeoutMs=9999}, pinotQuery:PinotQuery(dataSource:DataSource(tableName:transaction_OFFLINE), selectList:[_expression_(type:FUNCTION, functionCall:Function(operator:MAX, operands:[_expression_(type:IDENTIFIER, identifier:Identifier(name:transactionDate))]))]), limit:10)})```
  @mayanks: Is that from query to _OFFLINE table?
  @joao.comini: yes
  @mayanks: Hmm, that really does not make sense. It should just loop over segment metadata and find max. How many segments do you have?
  @joao.comini: just one
  @joao.comini: i just ran a job for it
  @mayanks: how big is the segment
  @joao.comini: 221mb
  @joao.comini: i has 4M docs in it
  @mayanks: That seems fine
  @mayanks: Do you really need MILLIS time stamp?
  @mayanks: You are getting `Failed to find segment with valid end time for table: {}, no time boundary generated"` because the query to get max time from offline is timing out
  @mayanks: And due to that, you don't have a time boundary
  @mayanks: Can you delete and re-create the offline table?
  @joao.comini: nope, i can change it to SECONDS
  @mayanks: And check if you can get max time from offline table
  @mayanks: Let's do the delete -> recreate of offline table first
  @joao.comini: ok!
  @mayanks: before changing the time unit
  @joao.comini: ok, should i recreate the segment already?
  @mayanks: can you paste the segment metadata here?
  @joao.comini: right, one sec
  @mayanks: `metadata.properties` file in segment folder
  @joao.comini: hmm, the segment folder is empty :anguished:
  @mayanks: I thought you said it is 221MB?
  @joao.comini: yes, it is
  @joao.comini: but there's no segments at the segment folder
  @mayanks: By segment folder I mean the segment itself
  @mayanks: The untarred file
  @joao.comini: oh, right
  @joao.comini: ```segment.padding.character = \u0000 segment.name = transaction_OFFLINE_1607011097024_1607097496995_0 segment.table.name = transaction_OFFLINE segment.dimension.column.names = customerUuid,transactionId segment.datetime.column.names = transactionDate segment.time.column.name = transactionDate segment.total.docs = 4198229 segment.start.time = 1607011097024 segment.end.time = 1607097496995 segment.time.unit = MILLISECONDS column.customerUuid.cardinality = 1523662 column.customerUuid.totalDocs = 4198229 column.customerUuid.dataType = STRING column.customerUuid.bitsPerElement = 21 column.customerUuid.lengthOfEachEntry = 36 column.customerUuid.columnType = DIMENSION column.customerUuid.isSorted = false column.customerUuid.hasNullValue = false column.customerUuid.hasDictionary = true column.customerUuid.textIndexType = NONE column.customerUuid.hasInvertedIndex = true column.customerUuid.isSingleValues = true column.customerUuid.maxNumberOfMultiValues = 0 column.customerUuid.totalNumberOfEntries = 4198229 column.customerUuid.isAutoGenerated = false column.customerUuid.partitionFunction = Murmur column.customerUuid.numPartitions = 20 column.customerUuid.partitionValues = 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19 column.customerUuid.minValue = 0000167e-d426-4686-b315-31b756daace3 column.customerUuid.maxValue = fffffca6-52a8-4324-bd1c-c26ab3511720 column.customerUuid.defaultNullValue = null column.transactionDate.cardinality = 3769234 column.transactionDate.totalDocs = 4198229 column.transactionDate.dataType = LONG column.transactionDate.bitsPerElement = 22 column.transactionDate.lengthOfEachEntry = 0 column.transactionDate.columnType = DATE_TIME column.transactionDate.isSorted = true column.transactionDate.hasNullValue = false column.transactionDate.hasDictionary = true column.transactionDate.textIndexType = NONE column.transactionDate.hasInvertedIndex = true column.transactionDate.isSingleValues = true column.transactionDate.maxNumberOfMultiValues = 0 column.transactionDate.totalNumberOfEntries = 4198229 column.transactionDate.isAutoGenerated = false column.transactionDate.datetimeFormat = 1:MILLISECONDS:EPOCH column.transactionDate.datetimeGranularity = 1:MILLISECONDS column.transactionDate.minValue = 1607011097024 column.transactionDate.maxValue = 1607097496995 column.transactionDate.defaultNullValue = -9223372036854775808 column.transactionId.cardinality = 4198229 column.transactionId.totalDocs = 4198229 column.transactionId.dataType = LONG column.transactionId.bitsPerElement = 23 column.transactionId.lengthOfEachEntry = 0 column.transactionId.columnType = DIMENSION column.transactionId.isSorted = false column.transactionId.hasNullValue = false column.transactionId.hasDictionary = true column.transactionId.textIndexType = NONE column.transactionId.hasInvertedIndex = true column.transactionId.isSingleValues = true column.transactionId.maxNumberOfMultiValues = 0 column.transactionId.totalNumberOfEntries = 4198229 column.transactionId.isAutoGenerated = false column.transactionId.minValue = 1956565750 column.transactionId.maxValue = 1960763978 column.transactionId.defaultNullValue = -9223372036854775808 segment.index.version = v3```
  @mayanks: ```column.transactionDate.minValue = 1607011097024 column.transactionDate.maxValue = 1607097496995```
  @mayanks: Seems min max value of time are set correctly
  @mayanks: What folder are you referring to that is empty?
  @mayanks: can you push the segment and see if it is ONLINE in external view?
  @joao.comini: the folder inside the server
  @joao.comini: 1 sec
  @joao.comini: ```"OFFLINE": { "transaction_OFFLINE_1607011097024_1607097496995_0": { "Server_pinot-server-0.pinot-server-headless.pinot.svc.cluster.local_8098": "ONLINE", "Server_pinot-server-1.pinot-server-headless.pinot.svc.cluster.local_8098": "ONLINE" } }```
  @mayanks: Have you deleted->recreated already?
  @joao.comini: yes
  @mayanks: can you get max time from offline table now?
  @joao.comini: still no :disappointed:
  @mayanks: can you run any query in offline?
  @mayanks: what about count(*)?
  @joao.comini: min(transactionDate) works, select * works
  @joao.comini: count(*) doens't works, max(transactionDate) doesn't works
  @mayanks: Hmm, that really does not make sense
  @mayanks: does min return `1607011097024`?
  @joao.comini: yes
  @mayanks: did you say max works sometimes?
  @mayanks: or it never works?
  @mayanks: Are your JVM settings correct?
  @joao.comini: it works sometimes when querying both realtime and offline tables
  @mayanks: For offline only, does it ever work?
  @joao.comini: nope
  @joao.comini: ```jvmOpts: "-Xms512M -Xmx1G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:/opt/pinot/gc-pinot-server.log"```
  @joao.comini: i can increase these parameters
  @mayanks: can you change Xms and Xmx both to 4G?
  @joao.comini: yes
  @mayanks: I don't think that for metadata query it matters, but I am very puzzled with what you are describing
  @joao.comini: restarting the servers here
  @joao.comini: well, actually, looks like i matters, ain't getting any errors anymore
  @joao.comini: max(transactionDate) works fine now
  @mayanks: hmm
  @mayanks: That should fix the time-boundary issue too
  @mayanks: is your original problem solved now?
  @joao.comini: but count(*) returns odd results actually
  @mayanks: odd?
  @joao.comini:
  @joao.comini:
  @mayanks: it is probably coming from realtime?
  @mayanks: or is the query from offline table?
  @joao.comini: it's from from both
  @mayanks: try both individually
  @joao.comini: if i run `select count(*) from transaction_REALTIME`, i get `14226`
  @joao.comini: individually they work
  @mayanks: Hmm
  @mayanks: What is the time granularity you care about? Does DAYS work?
  @joao.comini: i thinks it's related to `Tracing`
  @joao.comini: after that i made a query with tracing enabled, everything broke
  @joao.comini: max(transactionDate) doen't works anymore
  @joao.comini: i need at most `MINUTES` granularity
  @joao.comini: I'll try to recreate the tables with `MINUTES` granularity
  @mayanks: If you feel Tracing is causing an issue, I recommend filing an issue and also posting the issue on pinot-dev
  @ssubrama: I think the time boundary is not being recorded correctly and is probably throwing an exception while processing the EV event?
  @mayanks: It is not being recoreded because for some reason offline queyr times out
  @joao.comini: ```2020/12/04 18:38:42.604 WARN [TimeBoundaryManager] [HelixTaskExecutor-message_handle_thread] Failed to find segment with valid end time for table: transaction_OFFLINE, no time boundary generated 2020/12/04 18:38:56.598 WARN [BaseBrokerRequestHandler] [jersey-server-managed-async-executor-1] Failed to find time boundary info for hybrid table: transaction```
  @joao.comini: got this at the broker after recreating the tables
  @mayanks: and your offline query is timing out for max, right?
  @joao.comini: Two things may happen: • If i *enable* tracing, i get an exception and after that no other query executes, it just timeouts or throw an exception • if i don't enable tracing, the queries run fine, but the results are wrong
  @joao.comini: • `select count(*) from transaction_REALTIME` returns `14243` • `select count(*) from transaction_OFFLINE` returns `4198229` • `select count(*) from transaction` returns `2406`
  @joao.comini: the first two results are right, but clearly the last one don't
  @ssubrama: Did you double-check that the time units are correct? Your column is named as Date but the units are millis. Is the offline segment populated in millis or days?
  @joao.comini: yes, it has an AVRO schema, `transactionDate` is a `LONG`. The offline segment is populated in `MINUTES` now
  @joao.comini: added this: ```"ingestionConfig": { "transformConfigs": [ { "columnName": "minutesSinceEpoch", "transformFunction": "toEpochMinutes(transactionDate)" } ] },```
  @joao.comini: one thing, does the `segmentPartitionConfig` has to be the same for both realtime and offline tables?
  @joao.comini: after changing the tables to `MINUTE` , tracing is not breaking the server anymore
  @mayanks: Are you saying that your problem is resolved with minutes?
  @joao.comini: about the timeouts and exceptions, yes, but i'm still getting wrong results like above
  @joao.comini: I'm probably missing something here. do `segmentPushFrequency` and `segmentPushType` configs influence time boundary management? There's something i should care about when pushing offline segments?
  @mayanks: @jackie.jxt ^^ can you comment on time boundary with minutes and above configs?
  @mayanks: Also count(*) seems to return incorrect answers
  @jackie.jxt: Reading the context
  @joao.comini: One more info: my topic has a retention of 10 days, that is, the realtime table has data from `2020-12-04 11:23:00` to `2020-11-23 18:46:00` as i created it today, and the offline segment that i pushed is a daily segment from day `2020-12-03` to `2020-12-04`
  @jackie.jxt: @joao.comini Can you please check the segment ZK metadata? You may use the zookeeper browser
  @jackie.jxt: Besides, the segment is not really partitioned (not related to the time boundary issue)
  @joao.comini: ```{ "id": "transaction_OFFLINE_26781300_26782739_0", "simpleFields": { "segment.crc": "2804350904", "segment.creation.time": "1607129889618", "segment.end.time": "26782739", "segment.index.version": "v3", "segment.name": "transaction_OFFLINE_26781300_26782739_0", "segment.offline.download.url": "", "segment.offline.push.time": "1607129929914", "segment.offline.refresh.time": "-9223372036854775808", "segment.partition.metadata": "{\"columnPartitionMap\":{\"customerUuid\":{\"numPartitions\":5,\"partitions\":[0,1,2,3,4],\"functionName\":\"Murmur\"}}}", "segment.start.time": "26781300", "segment.table.name": "transaction_OFFLINE", "segment.time.unit": "MINUTES", "segment.total.docs": "3736571", "segment.type": "OFFLINE" }, "mapFields": {}, "listFields": {} }```
  @joao.comini: the realtime segments looks like this: ```{ "id": "transaction__0__0__20201205T0031Z", "simpleFields": { "segment.crc": "-1", "segment.creation.time": "1607128311373", "segment.end.time": "-1", "segment.flush.threshold.size": "100000", "segment.flush.threshold.time": null, "segment.index.version": null, "segment.name": "transaction__0__0__20201205T0031Z", "segment.partition.metadata": "{\"columnPartitionMap\":{\"customerUuid\":{\"numPartitions\":20,\"partitions\":[0],\"functionName\":\"Murmur\"}}}", "segment.realtime.download.url": null, "segment.realtime.endOffset": "9223372036854775807", "segment.realtime.numReplicas": "2", "segment.realtime.startOffset": "8407", "segment.realtime.status": "IN_PROGRESS", "segment.start.time": "-1", "segment.table.name": "transaction_REALTIME", "segment.time.unit": "null", "segment.total.docs": "-1", "segment.type": "REALTIME" }, "mapFields": {}, "listFields": {} }```
  @joao.comini: they're all consuming segments
  @jackie.jxt: Do you have other realtime segments? Or this is the only one?
  @joao.comini: I have one for each partition of the topic
  @joao.comini: ``` { "id": "transaction__10__0__20201205T0031Z", "simpleFields": { "segment.crc": "-1", "segment.creation.time": "1607128311373", "segment.end.time": "-1", "segment.flush.threshold.size": "100000", "segment.flush.threshold.time": null, "segment.index.version": null, "segment.name": "transaction__10__0__20201205T0031Z", "segment.partition.metadata": "{\"columnPartitionMap\":{\"customerUuid\":{\"numPartitions\":20,\"partitions\":[10],\"functionName\":\"Murmur\"}}}", "segment.realtime.download.url": null, "segment.realtime.endOffset": "9223372036854775807", "segment.realtime.numReplicas": "2", "segment.realtime.startOffset": "8509", "segment.realtime.status": "IN_PROGRESS", "segment.start.time": "-1", "segment.table.name": "transaction_REALTIME", "segment.time.unit": "null", "segment.total.docs": "-1", "segment.type": "REALTIME" }, "mapFields": {}, "listFields": {} }```
  @joao.comini: ```2020/12/05 00:55:04.939 WARN [TimeBoundaryManager] [HelixTaskExecutor-message_handle_thread] Failed to find segment with valid end time for table: transaction_OFFLINE, no time boundary generated 2020/12/05 00:55:04.975 WARN [TimeBoundaryManager] [ClusterChangeHandlingThread] Failed to find segment with valid end time for table: transaction_OFFLINE, no time boundary generated```
  @jackie.jxt: Do you see other warning besides these?
  @jackie.jxt: Also, does this happen before you push the first offline segment?
  @joao.comini: nope, just after i push the segment
  @joao.comini: ```2020/12/05 00:58:49.971 WARN [TopStateHandoffReportStage] [HelixController-pipeline-default-pinot-(c508fcc8_DEFAULT)] Event c508fcc8_DEFAULT : Cannot confirm top state missing start time. Use the current system time as the start time. 2020/12/05 00:58:50.004 WARN [ZkBaseDataAccessor] [ZkClient-EventThread-31-pinot-zookeeper:2181] Fail to read record for paths: {/pinot/INSTANCES/Server_pinot-server-1.pinot-server-headless.pinot.svc.cluster.local_8098/CURRENTSTATES/102888f0a96001c/transaction_OFFLINE=-101} 2020/12/05 00:58:50.005 WARN [ZkBaseDataAccessor] [HelixController-pipeline-task-pinot-(c508fcc8_TASK)] Fail to read record for paths: {/pinot/INSTANCES/Server_pinot-server-1.pinot-server-headless.pinot.svc.cluster.local_8098/CURRENTSTATES/102888f0a96001c/transaction_OFFLINE=-101} 2020/12/05 00:58:50.005 WARN [AbstractDataCache] [HelixController-pipeline-task-pinot-(c508fcc8_TASK)] znode is null for key: /pinot/INSTANCES/Server_pinot-server-1.pinot-server-headless.pinot.svc.cluster.local_8098/CURRENTSTATES/102888f0a96001c/transaction_OFFLINE```
  @joao.comini: got these at the controller as well
  @jackie.jxt: I think I know the reason for this behavior
  @jackie.jxt: There is no overlapping between the realtime and offline data, thus the time boundary won't merge the result properly
  @jackie.jxt: Can you try `select min(transactionDate) from transaction_REALTIME` and also `select min(transactionDate) from transaction` ?
  @jackie.jxt: In order for hybrid table to work, there has to be time overlapping between the realtime table and offline table
  @jackie.jxt: With the current config, the time boundary should be set at `offline end time - 1DAY` = `Dec 2nd, 2:59:00 AM`
  @jackie.jxt: It won't query any data from the offline table, but realtime side does not have the data for this time span, and that's the reason why the result does not match
  @joao.comini: min(minutesSinceEpoch) from realtime -> 26769026 -> Monday, 23 November 2020 14:26:00 min(minutesSinceEpoch) from both -> 26781777 -> Wednesday, 2 December 2020 10:57:00
  @jackie.jxt: Can you also try `select min(transactionDate) from transaction_REALTIME where transactionDate > 26781299`? I think it should return the same result as querying both
  @joao.comini: yes, u're right : `26781777`
  @jackie.jxt: Here the time boundary is correctly generated, but because the offline data and realtime data is not in sync, it returns unexpected result
  @joao.comini: what do you mean with "in sync"?
  @jackie.jxt: For the time span of the offline segment (`26781300 to 26782739`), the realtime table should have the same data already consumed
  @mayanks: I thought real-time has last 7 days of data?
@masakal: @masakal has joined the channel
@thuynh: @thuynh has joined the channel
@nishant: @nishant has joined the channel

#pinot-docs


@yupeng: @yupeng has joined the channel
@yupeng: shall we add versions to docs?
@yupeng: so that we know the config/function signature/features of previous version
@g.kishore: we have that already, looks like we did not create it for 0.5.0
@yupeng: no 0.6 either
@yupeng: in fact only 0.4
@g.kishore: agree, we should create that
@g.kishore: its as simple as creating a branch at that commit
@yupeng: yeah
@yupeng: thx
@g.kishore: does anyone know the commit versions?
@g.kishore:
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

Reply via email to