#general


@akshay13jain: @akshay13jain has joined the channel
@zaid.mohemmad: @zaid.mohemmad has joined the channel
@karinwolok1: Meetup tomorrow (intro level), if anyone wants to join. Feel free to invite friends as well if you think they would benefit from Apache Pinot. :heart:
@karinwolok1: :speaker: Reminder - this conference has an open CFP (looking for speakers). CFP deadline is this week. Be a conference speaker and share your story about Apache Pinot
@dino.occhialini: @dino.occhialini has joined the channel
@scott.cohen: @scott.cohen has joined the channel
@aaron.weiss: @aaron.weiss has joined the channel

#random


@akshay13jain: @akshay13jain has joined the channel
@zaid.mohemmad: @zaid.mohemmad has joined the channel
@dino.occhialini: @dino.occhialini has joined the channel
@scott.cohen: @scott.cohen has joined the channel
@aaron.weiss: @aaron.weiss has joined the channel

#troubleshooting


@nair.a: Hey team, i have few questions, can someone help ? 1) Queries are not returning results most of the time. Upon checking broker logs found the following : ```Failed to find servers hosting segment: mytable_0_8_20211029T2056Z for table: mytable_REALTIME (all ONLINE/CONSUMING instances: [] are disabled, but find enabled OFFLINE instance: Server_ip_8098 from OFFLINE instances: [Server_ip_8098], not counting the segment as unavailable)``` is this query timeout case? 2) I have set, flush.threshold.size to 10mn. But segments are getting created with lesser rows ( Total docs: 3.4mn). Is this expected? 3) What type of index is recommended on Realtime table with upsert mode on ? 4) In upsert mode, any limitation on "comparison time column" , i.e timestamp format, granularity? my table date column is in yyyyMMddHH format. comparison time column will be in timestamp format yyyy-MM-dd HH:mm:ss ``` { "upsertConfig": { "mode": "FULL", "comparisonColumn": "anotherTimeColumn" } }``` 5) Queries are timing out at 10secs, even after changing the values at broker and server level. anyother configs needs to be changed ? pinot.broker.timeoutMs pinot.server.query.executor.timeout
@alihaydar.atil: Hello everyone, I am using version 0.7.1. I am trying to create a hybrid table. Do i have to put controller.task.frequecyInSeconds in my controller config file? it says it is deprecated in configuration reference.
  @xiangfu0: Not necessarily
  @xiangfu0: You just need to follow doc to create the corresponding real-time table and offline table
  @alihaydar.atil: @xiangfu0 Thanks)
@akshay13jain: @akshay13jain has joined the channel
@zaid.mohemmad: @zaid.mohemmad has joined the channel
@dadelcas: Hey team, I've got an Avro schema which contains an array of records in a child field. I want to convert this to JSON during ingestion so I've added a transformation for this column to my realtime table. I've specified `$` as my complex type delimiter because I've got some Groovy transformations that I need to apply to other columns and is the only delimiter I can use to make my field names compatible with Groovy identifiers. My config looks like: ```... "complexTypeConfig": { "delimiter": "$", ... }, "transformConfigs": [ ... "columnName": "some_field", "transformFunction": "json_format(parent_field$some_field)" ... ], ...```
  @dadelcas: This is not working for me. It seems messages are dropped because the json_format function can't be applied
  @dadelcas: Any pointers on how I should do this?
  @dadelcas: By the way, I've tried using `__` (double underscore) as my complex type delimiter but the complex type transformer didn't like that and was unable to extract the values resulting in null columns. I didn't spot anything strange in the code so I was wondering why this delimiter can't be used
  @g.kishore: i think $ sign is causing some issues
  @g.kishore: try escaping
  @dadelcas: The endpoint rejects $ and wont let me create the table. I've tried adding \ in front of it and I've got a different error
@vibhor.jain: Hi Team, We have a hybrid table for our analytics use case and were using UPSERT for REALTIME table. It was working perfectly fine in 0.8. When minion was moving data to OFFLINE, we were using mergeType: "dedup" and duplicates were getting eliminated in OFFLINE flow also. When we upgraded to 0.9, the UPSERT is no more supported for hybrid table. This is blocking our table deployment. We understand UPSERT cannot work for OFFLINE table but why is it blocked for hybrid tables? Can someone clarify if we are missing something here?
  @mayanks: If a row is being upserted (via RT ingestion today), but the previous row for the pk has been moved to offline part, then the upsert won't work.
  @mayanks: @walterddr, we might relax this check via a config. Reason being there are cases where it may be OK to limit upsert to RT retention time.
  @mayanks: @vibhor.jain Please note though, 0.9 is not officially released yet.
  @walterddr: relaxing this check for now:
  @npawar: according to me, we should not support the hybrid table + upsert. this particular case is an exception ( a combination of the dedup functionality and realtime retention) and there’s still cases where it won’t work (as pointed by mayank). It will not work properly for majority of the usecases
  @walterddr: what would be the error message for the failure use case Mayank mentioned? I guess as long as the task itself will error out with proper error message indicating the issue we should be ok to remove this constrain from the validation phase.
  @walterddr: (or we can provide a warning log? i dont know this would be off too much help but worth at least logging it somewhere)
  @npawar: no error message. upsert just won’t work
  @walterddr: @vibhor.jain can you describe exactly what behavior you want to achieve with this realtime to offline transfer together with upsert? I am not 100% sure we sorted out all the scenarios
@luisfernandez: How do I know if a segment is too big ?
  @mayanks:
  @luisfernandez: :pray: thank you!
@walterddr: @walterddr has joined the channel
@luisfernandez: in the logs i’m observing ```2021-11-09 12:53:00 Slow query: request handler processing time: 441, send response latency: 1, total time to handle request: 442 2021-11-09 12:53:00 Processed requestId=1975257,table=etsyads_metrics_REALTIME,segments(queried/processed/matched/consuming)=46/46/46/1,schedulerWaitMs=0,reqDeserMs=0,totalExecMs=441,resSerMs=0,totalTimeMs=441,minConsumingFreshnessMs=1636480380211,broker=Broker_pinot-broker-1.pinot-broker-headless.pinot.svc.cluster.local_8099,numDocsScanned=20584,scanInFilter=0,scanPostFilter=123504,sched=fcfs,threadCpuTimeNs=0``` i was able to then find the request id in the broker and got some more info: ```requestId=1976569,table=ads_metrics_REALTIME,timeMs=234,docs=17731/290711208,entries=0/106386,segments(queried/processed/matched/consuming/unavailable):46/46/46/1/0,consumingFreshnessTimeMs=1636480906334,servers=1/1,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);pinot-server-1_R=0,233,7479,0,-1,offlineThreadCpuTimeNs=0,realtimeThreadCpuTimeNs=0,query=SELECT product_id, SUM(click_count), SUM(impression_count), SUM(cost), SUM(order_count), SUM(revenue) FROM ads_metrics WHERE user_id = 13133627 AND serve_time BETWEEN 1633924800 AND 1636520399 GROUP BY product_id LIMIT 6000``` is there any way i could tell from these logs why this is being slow (?) only thing I can see is the `scanPostFilter=123504` which may happen because of the group by i believe we currently do not have any indexes into that product_id column, would adding one speed up things in any way?
  @richard892: could you get a profile from the server process while querying it? e.g. `jcmd <pid> JFR.start duration=60s filename=server.jfr` then copy the jfr file off the box? or if you have async-profiler installed already that would be better
  @luisfernandez: can i profile with jvisualvm?
  @richard892: preferably not, it's super high overhead
  @richard892: and inaccurate
  @richard892: do you have a JDK with jcmd?
  @richard892: if you know the pid of the server process, profiling with JFR is as easy as the command above
  @luisfernandez: i do have jcmd i think
  @luisfernandez: i did
  @luisfernandez: `pidof java` can i do that lol
  @luisfernandez: seems like there’s no `ps aux` in this machine
  @luisfernandez: i’m using whatever configuration is there in the helm chart
  @richard892: jps should identify the server process
  @luisfernandez: oh yea i think we have somethin
  @luisfernandez: so how do i read the server.jfr
  @richard892: you can load it in JMC
  @richard892: download it from the server first
  @luisfernandez:
  @luisfernandez: is something like this what i’m supposed to see in jmc
  @richard892: yes it gives rule based advice
  @richard892: there won't be anything sensitive in the file, if possible please send it to me privately and I'll take a look tomorrow
  @richard892: the best place to look is method profiling
@dino.occhialini: @dino.occhialini has joined the channel
@scott.cohen: @scott.cohen has joined the channel
@aaron.weiss: @aaron.weiss has joined the channel

#pinot-dev


@akshay13jain: @akshay13jain has joined the channel
@xiangfu0: seems the kinesis test doesn’t work again @kharekartik
@xiangfu0: Tried upgrade localstack version, but not working this time. Maybe let’s not enable it by default?
@kharekartik: yeah we can disable the integration test for now can you send me the error log
@xiangfu0: It's same as last time the startKinesis method does move on
@xiangfu0: ``` cloud.localstack.docker.exception.LocalstackDockerException: Could not start the localstack docker container. at cloud.localstack.Localstack.startup(Localstack.java:104) at org.apache.pinot.integration.tests.RealtimeKinesisIntegrationTest.startKinesis(RealtimeKinesisIntegrationTest.java:224) at org.apache.pinot.integration.tests.RealtimeKinesisIntegrationTest.setUp(RealtimeKinesisIntegrationTest.java:135) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:108) at org.testng.internal.Invoker.invokeConfigurationMethod(Invoker.java:523) at org.testng.internal.Invoker.invokeConfigurations(Invoker.java:224) at org.testng.internal.Invoker.invokeConfigurations(Invoker.java:146) at org.testng.internal.TestMethodWorker.invokeBeforeClassMethods(TestMethodWorker.java:166) at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:105) at org.testng.TestRunner.privateRun(TestRunner.java:744) at org.testng.TestRunner.run(TestRunner.java:602) at org.testng.SuiteRunner.runTest(SuiteRunner.java:380) at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:375) at org.testng.SuiteRunner.privateRun(SuiteRunner.java:340) at org.testng.SuiteRunner.run(SuiteRunner.java:289) at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52) at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86) at org.testng.TestNG.runSuitesSequentially(TestNG.java:1301) at org.testng.TestNG.runSuitesLocally(TestNG.java:1226) at org.testng.TestNG.runSuites(TestNG.java:1144) at org.testng.TestNG.run(TestNG.java:1115) at com.intellij.rt.testng.IDEARemoteTestNG.run(IDEARemoteTestNG.java:66) at com.intellij.rt.testng.RemoteTestNGStarter.main(RemoteTestNGStarter.java:109)```
@xiangfu0: I’ve merged the PR: to disable KinesisTest for now. Please rebase your PRs if you are encountering the CI failure.

#getting-started


@akshay13jain: @akshay13jain has joined the channel
@bagi.priyank: Does the query console only show limited results for a query? I am wondering why I am seeing only some rows in results to query like ```SELECT col1, col2, col3, DISTINCTCOUNT(col4) AS distinct_col4 FROM table GROUP BY col1, col2, col3``` the star-tree index looks like ``` "starTreeIndexConfigs": [ { "dimensionsSplitOrder": [ "col1", "col2", "col3" ], "skipStarNodeCreationForDimensions": [], "functionColumnPairs": [ "DISTINCTCOUNT__col4" ], "maxLeafRecords": 1 } ],``` can i also add `DistinctCountHLL__col4` and `DistinctCountThetaSketch__col4` to `functionColumnPairs` and evaluate the performance for all 3 for this query?
@jackie.jxt: Startree only supports `distinctcounthll` because it's intermediate result size is bounded
@jackie.jxt: You need to add `limit` to the query, or it defaults to 10
@bagi.priyank: Oh no theta sketch either?
  @npawar: this is the list of supported functions . No theta sketch yet
@bagi.priyank: And thank you Jackie!

#releases


@akshay13jain: @akshay13jain has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org

Reply via email to