#general
@myeole: I am trying to fetch PARQUET files from s3 and load into pinot DB. I am using offline table. I am running this command with my job spec ./bin/pinot-admin.sh LaunchDataIngestionJob -jobSpecFile examples/batch/metrics/ingestionJobSpec.yaml I am seeing the following errors, any idea how to solve this issue ? Jan 13, 2021 6:34:24 PM WARNING: org.apache.parquet.CorruptStatistics: Ignoring statistics because created_by could not be parsed (see PARQUET-251): parquet-mr org.apache.parquet.VersionParser$VersionParseException: Could not parse created_by: parquet-mr using format: (.+) version ((.*) )?\(build ?(.*)\) at org.apache.parquet.VersionParser.parse(VersionParser.java:112) at org.apache.parquet.CorruptStatistics.shouldIgnoreStatistics(CorruptStatistics.java:60) at org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:263) at org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:567) at org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:544) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:431) at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:238) at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:234) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPFailed to generate Pinot segment for file -
@fx19880617: The issue here is that Pinot using parquet avro lib to read it, which doesn’t understand int96 type
@fx19880617: Is it possible to convert it to int64?
@myeole: you mean in table schema
@fx19880617: Yes, if it’s still not working, then we may need some fix on that to bypass int96
@myeole: I changed to BIGINT in parquet and to unix timestamp in input file. I am using long in my schema. but I am seeing this error any idea
@myeole: @fx19880617 Failed to generate Pinot segment for file -
@humengyuk18: @humengyuk18 has joined the channel
@zxcware: Hi team, is there a limit on number of znodes per parent node in ZK today?
@g.kishore: things typically start slowing down with ZK after you see 100's of thousands of ZNode
@g.kishore: we have seen thousands of tables in production and it works fine..
#random
@humengyuk18: @humengyuk18 has joined the channel
#troubleshooting
@humengyuk18: @humengyuk18 has joined the channel
@valentin: Hello, I’m having a weird query issue, when I try to query my cluster (via Pinot UI) with: ```SELECT "tmpId" from datasource_5ffdbf421eb80003001818fe WHERE "name" = "identify" AND "clientId" = "ef8e0112fbac1450776931712bdaad3bb0deb121" GROUP BY "tmpId" LIMIT 1``` The query is executed But with: ```SELECT "tmpId" from datasource_5ffdbf421eb80003001818fe WHERE "name" = "identify" AND "clientId" = "3f8e0112fbac1450776931712bdaad3bb0deb121" -- 3f8e0112fbac1450776931712bdaad3bb0deb121 GROUP BY "tmpId" LIMIT 1``` I get the following error: ```[ { "errorCode": 200, "message": "QueryExecutionError:\norg.antlr.v4.runtime.misc.ParseCancellationException\n\tat org.antlr.v4.runtime.BailErrorStrategy.recoverInline(BailErrorStrategy.java:66)\n\tat org.antlr.v4.runtime.Parser.match(Parser.java:203)\n\tat org.apache.pinot.pql.parsers.PQL2Parser._expression_(PQL2Parser.java:828)\n\tat org.apache.pinot.pql.parsers.PQL2Parser._expression_(PQL2Parser.java:745)\n\tat org.apache.pinot.pql.parsers.Pql2Compiler.parseToAstNode(Pql2Compiler.java:148)\n\tat org.apache.pinot.pql.parsers.Pql2Compiler.compileToExpressionTree(Pql2Compiler.java:153)\n\tat org.apache.pinot.common.request.transform.TransformExpressionTree.compileToExpressionTree(TransformExpressionTree.java:46)\n\tat org.apache.pinot.broker.requesthandler.BaseBrokerRequestHandler.handleSubquery(BaseBrokerRequestHandler.java:471)\n\tat org.apache.pinot.broker.requesthandler.BaseBrokerRequestHandler.handleRequest(BaseBrokerRequestHandler.java:215)\n\tat org.apache.pinot.broker.api.resources.PinotClientRequest.processSqlQueryPost(PinotClientRequest.java:155)\n\tat sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52)" } ]``` I don’t really understand the error and why it’s happening, the only thing that changes between 2 queries is the `clientId` value that starts with `ef` in the first query and starts with `3f` in the 2nd one
@mayanks: Any reason you are using PQL instead of SQL?
@valentin: I’m using SQL (via the pinot ui, the HTTP call is `
@mayanks: I see, the stack trace suggested otherwise, but that is for transform, not the query. Not sure why this might happen just by changing two characters. Can you try removing the quotes from the literals.
@mayanks: Seems you have a typo in the second query? Check the clientId predicate (it has — character)
@valentin: You’re talking about the comment? I have the same issue without it: ```SELECT tmpId from datasource_5ffdbf421eb80003001818fe WHERE name = "identify" AND clientId = "3f8e0112fbac1450776931712bdaad3bb0deb121" GROUP BY tmpId LIMIT 1```
@mayanks: Ah, on the phone so didn’t see the syntax correctly. Seems like a bug, could you file an issue
@g.kishore: are you using pql or sql?
@mayanks: SQL, however, from stack trace we see that we internally use PQL for transforms
@mayanks: FWIW, I can compile the query from IDE
@g.kishore: ```org.apache.pinot.broker.requesthandler.BaseBrokerRequestHandler.handleSubquery(BaseBrokerRequestHandler.java:471)```
@g.kishore: there is no subquery here
@mayanks: Yeah, noticed that too. But the code doesn't have 'if'
@g.kishore: ok
@g.kishore: @jackie.jxt ^^
@jackie.jxt: @valentin Can you try single quote instead of double quote? In SQL, double quote is for identifier, you need single quote for literal
@contact: He opened an issue there:
@contact: @jackie.jxt Indeed that works with single quote for the literal value, thanks for the insight
@contact: However not sure to get why it get sometimes "correctly" interpreted
@jackie.jxt: @contact IIRC, it is not interpreted correctly, but just not throwing exception. `"name" = "identify"` will be interpreted into `name - identify = 0`, where both `name` and `identify` are treated as identifier (column)
@amitchopra: Hi, trying to troubleshoot an issue i am facing. I have a K8S cluster setup with 4 server instances. For the server, i changed replicas to 2 and did helm upgrade. Even though the servers in K8S has reduced from 4 to 2, i still see the deleted ones in bad state in pinot UI. Shouldn’t the deleted servers from K8S be deleted from pinot as well? Secondly, the problem i am facing is that 2 of the segments are mapped to the deleted servers. And now it is not allowing me to drop the server instances manually too. And those 2 segments too are in bad state. Ideas?
@npawar: you prolly need to untag the old instances, and do a rebalance. that should move all segments to the live ones, and only then would you be able to delete the instances
@npawar: @fx19880617 are there any special considerations for K8 setup other than this?
@npawar:
@npawar: guide for untag and rebalance ^^
@amitchopra: got it. Thanks
@fx19880617: by removing servers from k8s won't delete the pinot-server automatically
@fx19880617: delete pinot servers instances require manual operations
@amitchopra: ok, thanks
#announcements
@valentin: @valentin has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
