#general
@zyedmohammedanees: @zyedmohammedanees has joined the channel
@joshhighley: @joshhighley has joined the channel
@bowlesns: I put in a request for Gitbook access, if someone could check on that so I can start contributing to the docs I would appreciate it :slightly_smiling_face:
@g.kishore: Added
@bowlesns: Thanks!
@ken: Really interesting article about Uber doing schema-agnostic log aggregations…but they went with ClickHouse, not Pinot?!?
@g.kishore: Looks like they made this decisionn before Pinot added support for json structure. The design in that article looks complex.. using json indexing is probably much simpler.
@yupeng: yeah, at the time when they made the decision, json index was not available in pinot yet.
@dutta.kinshuk: @dutta.kinshuk has joined the channel
#random
@zyedmohammedanees: @zyedmohammedanees has joined the channel
@joshhighley: @joshhighley has joined the channel
@dutta.kinshuk: @dutta.kinshuk has joined the channel
#troubleshooting
@zyedmohammedanees: @zyedmohammedanees has joined the channel
@joshhighley: @joshhighley has joined the channel
@joshhighley: New to Pinot. We're using the Pinot Docker images. We've created offline tables successfully, but can't create a realtime table. The segment status is 'bad'. There's no error messages in the logs for the broker, controller, or server so I'm stuck on how to debug this?
@bowlesns: Not sure what your setup is but I was getting some bad segments and after I gave some of the components some more resources that helped. Try reloading the segment (in the UI you click on the segment, and then reload) and see if it reloads successfully.
@bowlesns: How are you ingesting/creating the segments?
@joshhighley: I've tried reloading with no success. Data will eventually come from a Kafka topic (different server). Kafka authentication is a bit complex: SASL_SSL with a user id/password and a corporate self-signed CA for the ssl cert. I've added the certificate authority to the docker images' cacerts file /usr/local/openjdk-8/jre/lib/security/cacerts I've specified the id/pw in the streamConfigs stream.kafka.username and stream.kafka.password
@joshhighley: I would expect a connection error to be in the logs somewhere though
@chinmay.cerebro: @fx19880617 @jackie.jxt any pointers here ^^ ?
@fx19880617: how do you connect to kafka cluster? does the connection open?
@fx19880617: also if you deploy pinot controller/server separately, then you need to configure kafka auth in both controller/server
@ken: @joshhighley FWIW, I’ve been fooled into thinking there was no logging output before with Pinot, due to `<RandomAccessFile name="controllerLog" fileName="pinotController.log" immediateFlush="false">` in the log configuration file(s). Since `immediateFlush` is false, a connection error wouldn’t show up right away in the logs.
@joshhighley: I've managed to find an error message in the Controller log. "org.apache.kafka.common.errors.TimeoutException:Timeout expired while fetching topic metadata". I have added the certificates to the Controller container's cacerts and restarted it. Still no success. However, I was able to start the Kafka container as described in the Pinot docs Manual Cluster Setup and successfully connect the realtime table to that Kafka instance. So, this seems to be some kind of connectivity or authentication issue with the other Kafka server.
@bowlesns: Getting some failed tasks when doing a batch ingestion, and looking at the minion logs I see a lot of `Caused by:
@bowlesns:
@fx19880617: seems to be segment push error
@fx19880617: do you have minion config to check what’s the push segment uri?
@bowlesns: Sure let me post after this meeting. It’s the same minion config that was working earlier, but I did change some transformConfig for the table
@bowlesns: I’m also using 100 minions :stuck_out_tongue:
@fx19880617: it could also be that too many parallel push occupied the controller threads which caused the time out
@fx19880617: We observed this issue during data bootstrapping of super huge data set
@fx19880617: one thing you can optimize is to set configs to store segments to deep store like s3 or gcs then do URI push
@fx19880617: default is tar push, which is reliable but costly :slightly_smiling_face:
@bowlesns: I’ve got it set to URI now. Believe I tried METADATA and TAR before but it complained. Will try again, thanks for the help.
@fx19880617: got it, then the only thing I can think of is to increase the default retry
@fx19880617: also you are seeing some segments been added right?
@bowlesns: Correct it was working before and a few segments would be “BAD” but after reloading they were fine, so sounds like it could be the threads issue you mentioned.
@bowlesns: But then I scaled the minions way up from like 20 to 100 to see how quickly I can ingest the data
@fx19880617: :stuck_out_tongue:
@fx19880617: got it
@fx19880617: per table level, for idealstates update, this is single point of processing
@fx19880617: so all threads will be working on the same zNode
@bowlesns: Should I try to scale zookeeper to speed up how fast the entries are done? Gave it more heap size as a precaution.
@bowlesns: doesn’t look like it’s using many resources
@fx19880617: you can give more cpu and memory and see if that helps the speed
@bowlesns: This is during a `SegmentGenerationAndPushTask`
@chundong.wang: Ran into `IllegalStateException` when using string functions in where clause. :cry:
@chundong.wang: ```select currency_code, from orders where SUBSTR(currency_code, 0, 2) <> 'US' limit 10```
@chundong.wang: Exception: ``` "message": "QueryExecutionError:\njava.lang.IllegalStateException: Caught exception while invoking method: public static java.lang.String org.apache.pinot.common.function.scalar.StringFunctions.substr(java.lang.String,int,int) with arguments: [, 0, 2]\n\tat org.apache.pinot.common.function.FunctionInvoker.invoke(FunctionInvoker.java:148)\n\tat org.apache.pinot.core.operator.transform.function.ScalarTransformFunctionWrapper.transformToStringValuesSV(ScalarTransformFunctionWrapper.java:209)\n\tat org.apache.pinot.core.operator.dociditerators.ExpressionScanDocIdIterator.processProjectionBlock(ExpressionScanDocIdIterator.java:167)\n\tat org.apache.pinot.core.operator.dociditerators.ExpressionScanDocIdIterator.next(ExpressionScanDocIdIterator.java:81)```
@jackie.jxt: Seems the problem is caused by calling `substring` on an empty string
@chundong.wang: It works if `substr` is in the select statement
@chundong.wang: ```select SUBSTR(currency_code, 0, 2) from order --where SUBSTR(currency_code, 0, 2) <> 'US' limit 10```
@chundong.wang: Is there way to tell if it’s related to some of the segments?
@jackie.jxt: Can you try adding a filter `where currency_code = ''`?
@chundong.wang: yep! that’d go wrong too
@jackie.jxt: Javadoc for `String.substring()` ```Returns a string that is a substring of this string. The substring begins at the specified beginIndex and extends to the character at index endIndex - 1. Thus the length of the substring is endIndex-beginIndex. Examples: "hamburger".substring(4, 8) returns "urge" "smiles".substring(1, 5) returns "mile" Params: beginIndex – the beginning index, inclusive. endIndex – the ending index, exclusive. Returns: the specified substring. Throws: IndexOutOfBoundsException – if the beginIndex is negative, or endIndex is larger than the length of this String object, or beginIndex is larger than endIndex.```
@chundong.wang: hmm… but it seems I can’t exclude empty strings… ```where currency_code<>'' and SUBSTR(currency_code, 0, 2) <> 'US'```
@chundong.wang: ah i see
@chundong.wang: `SUBSTR(RPAD(currency_code, 5, ' '), 0, 2) <> 'US'` would work :sweat_smile:
@jackie.jxt: You can probably try `substr(currency_code, 0, min(2, length(currency_code)))`
@jackie.jxt: Yeah, that works too
@chundong.wang: that works too!
@chundong.wang: probably min/length is better than RPAD
@chundong.wang: Would such string be intern’ed?
@jackie.jxt: What does `intern'ed` mean here?
@chundong.wang: Single copy of immutable string that kind of intern (
@chundong.wang: I’m wonder if `RPAD` would create more string in memory than arithmetic operations like min/strlen
@jackie.jxt: Could be. We don't use `intern` explicitly
@jackie.jxt: You can try both and see which one runs faster
@chundong.wang: Got it. Thanks for the help!
@dutta.kinshuk: @dutta.kinshuk has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
