Apache Pinot Daily Email Digest (2021-10-21)

Pinot Slack Email Digest Thu, 21 Oct 2021 19:00:30 -0700

#general

@nicole: @nicole has joined the channel
@awadesh.kumar: @awadesh.kumar has joined the channel
@very312: @very312 has joined the channel
@arpitc0707: @arpitc0707 has joined the channel
@devlearn75: @devlearn75 has joined the channel
@aylwin.souza: @aylwin.souza has joined the channel
@vinayv: @vinayv has joined the channel

#random

#troubleshooting

@nicole: @nicole has joined the channel
@piyush.chauhan: I am facing an issue using JDBC client of Pinot. I am able to do queries via postman to the broker. But getting the following problem. Failed to connect to url : jdbc: pinot://<broker-url> java.util.concurrent.ExecutionException: org.apache.pinot.client.PinotClientException: Pinot returned HTTP status 308, expected 200 I am using version 0.8.0 and following this guide
@kennybastani: Seems like you might not have access to the Pinot broker from your client. The JDBC client negotiates with the Pinot controller (port 9000) to get the host URL of the Pinot broker (port 8000 by default).
@kennybastani: Make sure you use the controller URL in your JDBC connection and ensure that you have access to the broker from your host machine
@surajkmth29: Hi folks, In a case where the lookup join returns null values for some rows, how can we filter out the null? I am trying something like: `select lookup('tableB', 'username', 'orgId', orgId, 'userId', userId) as username from tableA where username is not null limit 10` But I see an error with ```Unsupported predicate type: IS_NOT_NULL``` Full Error screenshot attached
@awadesh.kumar: @awadesh.kumar has joined the channel
@valentin: Hello, on the documentation () you’re explaining that text indexes aren’t supported with dictionary encoded columns. Do you know when we will be able to do it? I would use an inverted index + a text index Thank you
@g.kishore: We have a PR out.. @atri.sharma @richard892
@very312: @very312 has joined the channel
@arpitc0707: @arpitc0707 has joined the channel
@very312: Hi team, My team is struggling on “upsert” function. Here is a problem. We want to make realtime streaming user_state_table as an unique table while there are 3 different events which can update user_state_table. These 3 events have different columns but surely primary key and time dimension (create_time) are all included in these 3. The schema of user_state_table has intersecting columns of these 3 events. Lets say event_1 has columns of (id, create_time, a,b) and event_2 has columns of (id, create_time, c,d). And we set the upsert mode as full. If we publish event_1 -> event_2 ->event_1 in that order, column c and d become null even though we want last event_1 ‘s column a&b overwrite first event_1 while event_2’s colunms are not being null. How could we solve this problem by modifying table json & schema json? Please refer the comment to see how my team suggested schema. Many thks from korea
@very312: user state table { "tableName": "user_state", "tableType": "REALTIME", "segmentsConfig": { "timeColumnName": "create_time", "timeType": "SECONDS", "segmentPushType": "APPEND", "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy", "schemaName": "user_state", "replicasPerPartition": "1" }, "tenants": {}, "tableIndexConfig": { "loadMode": "MMAP", "streamConfigs": { "streamType": "kafka", "stream.kafka.consumer.type": "lowLevel", "stream.kafka.consumer.prop.auto.offset.reset": "smallest", "stream.kafka.topic.name": "user.event", "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder", "stream.kafka.hlc.zk.connect.string": "z-1.kafka-engine-v2.robgzv.c2.", "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory", "stream.kafka.zk.broker.url": "z-3.kafka-engine-v2.robgzv.c2.", "stream.kafka.broker.list": "b-1.kafka-engine-v2.robgzv.c2.,b-2.kafka-engine-v2.robgzv.c2.,b-3.kafka-engine-v2.robgzv.c2." }, "nullHandlingEnabled": true }, "fieldConfigList": [], "metadata": { "customConfigs": {} }, "routing": { "instanceSelectorType": "strictReplicaGroup" }, "upsertConfig": { "mode":”FULL” } }
@very312: user state schema { "schemaName": "user_state", "primaryKeyColumns": [ "id" ], "dimensionFieldSpecs": [ { "name": "id", "dataType": "STRING" }, { "name": "a", "dataType": "STRING" }, { "name": "b", "dataType": "STRING" }, { "name": "c", "dataType": "STRING" }, { "name": "d", "dataType": "STRING" } ], "metricFieldSpecs": [], "dateTimeFieldSpecs": [ { "name": "create_time", "dataType": "STRING", "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-ddTHH:mm:ss", "granularity": "1:SECONDS" } ] }
@kennybastani: I'm not aware of any way to resolve this situation, since you have multiple different event schemas that you want to merge into a partial upsert. This does sound like an interesting use case though. Maybe it's something we can add to Pinot in the future.
@kennybastani: We might be able to add a UDF called `Fold` that selects the ordered events and folds the most recent non-null field into the most recent event record.
@kennybastani: @jackie.jxt What do you think?
@jackie.jxt: I think you should be able to use partial upsert and define all 4 columns as overwrite
@jackie.jxt: Add @yupeng
@jackie.jxt:
@kennybastani: Thanks @jackie.jxt. I haven’t played with that yet. Seems like it should work as described for this scenario.
@kennybastani: It does look like the result will be returned as a set which will still include the new item. I will test this out later.
@very312: My team will get start to do it! Let me share the result!!
@kennybastani: Great @very312. :slightly_smiling_face:
@devlearn75: @devlearn75 has joined the channel
@aylwin.souza: @aylwin.souza has joined the channel
@vinayv: @vinayv has joined the channel

#onboarding

@piyush.chauhan: @piyush.chauhan has joined the channel

#metrics-plugin-impl

@drocha: @drocha has joined the channel
@drocha: @drocha has left the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org