Apache Pinot Daily Email Digest (2022-05-25)

Pinot Slack Email Digest Wed, 25 May 2022 20:06:16 -0700

#general

@dangngoctan2012: @dangngoctan2012 has joined the channel
@priya.shivakumar: @priya.shivakumar has joined the channel
@arnaud.zdziobeck: @arnaud.zdziobeck has joined the channel
@ralph.debusmann967: One basic question about data preparation (for ingestion into Pinot) - how do you combine e.g. multiple Kafka topics into one table in Pinot so that you can query them as one - without having JOINs? Is there any way to do it without heavy upfront stream processing using e.g. Kafka Streams/ksqlDB/Flink/Materialize etc.?
@g.kishore: Do you want to simply union thebtwo streams or perform some kind of join across the two streams
@ralph.debusmann967: A union would be a good start (basically putting a bunch of Kafka topics into one table in Pinot), of course some kind of join would be even better. I'd just like to avoid having to use stream processing for this and just pull the data from various Kafka topics into Pinot and go from there :slightly_smiling_face:
@ralph.debusmann967: How is this done in LinkedIn for example?
@g.kishore: It’s a samza job that does join and writes back to kafka
@ralph.debusmann967: Thanks! And what if I don't want to add a stream processing component to my architecture - what options would you recommend?
@g.kishore: Depends on is it a join or simple union of two topics
@ralph.debusmann967: So in our case we have e.g. one topic of daily aggregated Twitter sentiments and one topic of daily aggregated copper prices (simplified example). It could be that one of the time series has a different starting point compared to the other - e.g. the Twitter sentiments would start in 2015 and the copper prices in 1990. Would it be possible to bring the data starting from 2015 together into one Pinot table with the union operation?
@g.kishore: You can write a plug-in that is a composite consumer across multiple topics
@ralph.debusmann967: Cool thanks - I'll try that :grinning:
@g.kishore: happy to help if you can share the PR or a github.
@ysuo: Hi team, I have a question about using Pinot JDBC to connect Pinot controller deployed in K8s. ```DriverManager.registerDriver(new PinotDriver()); //Connection conn = DriverManager.getConnection(DB_URL); // will query DefaultTenant if not specified tenant here Properties info = new Properties(); info.putIfAbsent("tenant", "TestBroker"); Connection conn = DriverManager.getConnection(DB_URL,info); Statement statement = conn.createStatement();``` but the following error returned: ```Caused by: java.net.UnknownHostException: pinot-broker-8.pinot-broker-headless.pinot.svc.cluster.local: nodename nor servname provided, or not known at java.base/java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) at java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:929) at java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1515) at java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848) at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1505) at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1364) at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1298) at java.base/java.net.InetAddress.getByName(InetAddress.java:1248) at com.ning.http.client.NameResolver$JdkNameResolver.resolve(NameResolver.java:28) at com.ning.http.client.providers.netty.request.NettyRequestSender.remoteAddress(NettyRequestSender.java:359) at com.ning.http.client.providers.netty.request.NettyRequestSender.connect(NettyRequestSender.java:370) at com.ning.http.client.providers.netty.request.NettyRequestSender.sendRequestWithNewChannel(NettyRequestSender.java:282) ... 12 more``` Is there any configuration to make sure controller can get the right accessible broker url?
@ysuo: I’m locally testing connecting to pinot controller deployed in K8s cluster.
@teehan: @teehan has joined the channel
@matthew: @matthew has joined the channel
@tommaso.peresson: @tommaso.peresson has joined the channel
@ysuo: I tried locally to execute query via broker through the command below. curl -H “Content-Type: application/json” -X POST -d ‘{“sql”:“select * from action limit 1”}’ This pinot cluster is deployed in K8s cluster and we have 4 brokers in this pinot cluster. is the exposed broker gateway address. This table named action here is configured to tenant named tenanta which has only one broker. When I run the above command, I can get the right results sometimes. But most of the time, the following error returned. org.apache.pinot.client.PinotClientException: Query had processing exceptions: [{“message”:“BrokerResourceMissingError”,“errorCode”:410}] at org.apache.pinot.client.Connection.execute(Connection.java:127) at com.bigdata.PinotJava.main(PinotJava.java:53) Is there some configuration I’m missing here for this issue? Any idea how to fix it?
@xiangfu0: hi, for your case, you need to create a new service e.g. `pinot-broker-tenanta` with different node selector to pick the right pinot broker. Then you can query the exposed service or loadbalancer for that pinot broker. Current k8s setup is for pure shared tenant.
@ysuo: Hi, since sometimes this command could return the right result, is it because the table tenant is matched that time? I mean, in my test, I got one time right, and three times ‘BrokerResourceMissingError’, and it followed this pattern when I tried more times. So, is there a possibility to set tenant as a command parameter?
@cesaro.angelo: @cesaro.angelo has joined the channel
@ghita.saouir: @ghita.saouir has joined the channel
@m.ram3sh: @m.ram3sh has joined the channel
@sonam.dp42: @sonam.dp42 has joined the channel
@rbobbala: Hello Team, Can someone help me with the below error ```Error: INSTALLATION FAILED: unable to build kubernetes objects from release manifest: unable to recognize "": no matches for kind "Ingress" in version "extensions/v1beta1"```
@mayanks: @xiangfu0
@xiangfu0: I think this is due to the ingress upgrade for k8s
@xiangfu0: did you happen to enable the ingress here:
@rbobbala: Yes
@rbobbala: It tried to enable the ingress to access from the browser
@xiangfu0: I think your k8s is on higher version that it doesn’t support the current ingress
@xiangfu0: cc: @diana.arnos
@rbobbala: what is the version this supports
@rbobbala: ?
@xiangfu0:
@xiangfu0: current implementation is `apiVersion: extensions/v1beta1`
@rbobbala: Yes I want to know the K8 version that supports extensions/v1beta1
@xiangfu0: We should upgrade this to ```apiVersion: ```
@rbobbala: Thanks for sharing
@rbobbala: Can I change ingress APi version ?
@rbobbala: Just confused on how I can modify the ingress.yaml file jus for my deployment
@rbobbala: The helm chart uses the the templates from the Repo right
@rbobbala: Wondering how can I modify the templates and make use of Helm to install
@rbobbala: or the alternative way is to install k8 cluster with an older version that supports apiVersion: extensions/v1beta1
@xiangfu0: Yes, you can modify the chart
@rbobbala: okay
@xiangfu0: In short the chart is just a template to be installed
@xiangfu0: helm will apply values to the template
@rbobbala: Got it
@xiangfu0: values.yaml has the values and flags
@xiangfu0: i can review the change if would love to contribute as well
@rbobbala: Hope it doesn't mess up if I can change the apiVersion to ```apiVersion: ```
@xiangfu0: you need to change the rest of the file
@xiangfu0: just change apiVersion doesn’t work
@rbobbala: okay
@xiangfu0: you can follow to add another section after
@rbobbala: Thanks for sharing

#random

@dangngoctan2012: @dangngoctan2012 has joined the channel
@priya.shivakumar: @priya.shivakumar has joined the channel
@arnaud.zdziobeck: @arnaud.zdziobeck has joined the channel
@teehan: @teehan has joined the channel
@matthew: @matthew has joined the channel
@tommaso.peresson: @tommaso.peresson has joined the channel
@cesaro.angelo: @cesaro.angelo has joined the channel
@ghita.saouir: @ghita.saouir has joined the channel
@m.ram3sh: @m.ram3sh has joined the channel
@sonam.dp42: @sonam.dp42 has joined the channel

#feat-presto-connector

@gaetanmorlet: @gaetanmorlet has joined the channel

#pinot-power-bi

@gaetanmorlet: @gaetanmorlet has joined the channel

#troubleshooting

@dangngoctan2012: @dangngoctan2012 has joined the channel
@priya.shivakumar: @priya.shivakumar has joined the channel
@arnaud.zdziobeck: @arnaud.zdziobeck has joined the channel
@teehan: @teehan has joined the channel
@matthew: @matthew has joined the channel
@tommaso.peresson: @tommaso.peresson has joined the channel
@lars-kristian_svenoy: Hey team :wave: . I'm currently in the process of writing a custom flink job which is able to atomically replace the segments for a pinot refresh table. I've been looking into the segment replacement protocol, and wanted to see if I understand this correctly.. More info in thread
@lars-kristian_svenoy: So prior to uploading segments, I should call startReplaceSegments. Then after that has been called, can I then start calling uploadSegment? I guess in this case, I should be uploading segments to some other directory/bucket (s3). Once this is all done, do I then call endReplaceSegments? What do I do if there is a failure while uploading segments? Anything else I should know? Thank you all
@g.kishore: This is needed for batch replacement of segments in an atomic way.. if you want to just replace one segment at a time.. you can just call upload segment
@cesaro.angelo: @cesaro.angelo has joined the channel
@tommaso.peresson: Hi Everyone. I'm currently setting up a table that has a MV column called `users` containing a list of `user_id` . From what I've tried `distinctcounthllmv()` can't be used as an aggregated function in a star-tree index. Has anyone ever faced a similar problem? If yes how did you solved it? Is it possible to calculate the raw-hll state at ingestion time and then perform the estimation at query time? Thanks everyone for helping
@mayanks: Yes you can have an hll column in ingested data and can still query it using hll function
@tommaso.peresson: do you have any documentation on how to perform this?
@mayanks: @jackie.jxt
@ghita.saouir: @ghita.saouir has joined the channel
@m.ram3sh: @m.ram3sh has joined the channel
@sonam.dp42: @sonam.dp42 has joined the channel

#pinot-dev

@dadelcas: Hello, I'm reading the freshness metrics design document which is something we are thinking of using in one of our uses cases. However the freshness timestamp returned by pinot always seems to be the pinot indexing time. Reading through the code it seems there isn't a row metadata implementation for kafka. I'd like to confirm this is the case and if so I'd like to contribute the code changes to get this working as per the design document. I can't see an open issue in github related to this
@g.kishore: I thought it used timestamp from row metadata if it’s available
@dadelcas: Yup, it does choose indexing timestamp if row metadata is not available. Doesn't seem like any of the stream ingestion plugins returns row metadata at the moment
@dadelcas: The default implementation returns null
@dadelcas: I'm going to raise an issue in github and open a PR, I'll post the link here if further discussion is needed
@g.kishore: :+1:
@dadelcas: This is the PR, I didn't write a lot of details on it nor in the linked github issue. Apologies for the rush
@ken: Is anyone else getting a dependency convergence failure when building from master? Details in thread…
@ken: I ran `mvn clean install -DskipTests -Pbin-dist` from the top, and it failed when building `pinot-spark` with: ```[WARNING] Dependency convergence error for org.apache.hadoop:hadoop-yarn-api:2.6.5 paths to dependency are: +-org.apache.pinot:pinot-spark:0.11.0-SNAPSHOT +-com.holdenkarau:spark-testing-base_2.11:2.4.0_0.14.0 +-org.apache.spark:spark-yarn_2.11:2.4.0 +-org.apache.hadoop:hadoop-yarn-api:2.6.5 and +-org.apache.pinot:pinot-spark:0.11.0-SNAPSHOT +-com.holdenkarau:spark-testing-base_2.11:2.4.0_0.14.0 +-org.apache.hadoop:hadoop-yarn-server-tests:2.8.3 +-org.apache.hadoop:hadoop-yarn-server-nodemanager:2.8.3 +-org.apache.hadoop:hadoop-yarn-api:2.8.3 and +-org.apache.pinot:pinot-spark:0.11.0-SNAPSHOT +-com.holdenkarau:spark-testing-base_2.11:2.4.0_0.14.0 +-org.apache.hadoop:hadoop-yarn-server-tests:2.8.3 +-org.apache.hadoop:hadoop-yarn-server-resourcemanager:2.8.3 +-org.apache.hadoop:hadoop-yarn-api:2.8.3 and +-org.apache.pinot:pinot-spark:0.11.0-SNAPSHOT +-com.holdenkarau:spark-testing-base_2.11:2.4.0_0.14.0 +-org.apache.hadoop:hadoop-yarn-server-tests:2.8.3 +-org.apache.hadoop:hadoop-yarn-server-resourcemanager:2.8.3 +-org.apache.hadoop:hadoop-yarn-server-applicationhistoryservice:2.8.3 +-org.apache.hadoop:hadoop-yarn-api:2.8.3 and +-org.apache.pinot:pinot-spark:0.11.0-SNAPSHOT +-com.holdenkarau:spark-testing-base_2.11:2.4.0_0.14.0 +-org.apache.hadoop:hadoop-yarn-server-tests:2.8.3 +-org.apache.hadoop:hadoop-yarn-api:2.8.3 and +-org.apache.pinot:pinot-spark:0.11.0-SNAPSHOT +-com.holdenkarau:spark-testing-base_2.11:2.4.0_0.14.0 +-org.apache.hadoop:hadoop-yarn-server-tests:2.8.3 +-org.apache.hadoop:hadoop-yarn-api:2.8.3 and +-org.apache.pinot:pinot-spark:0.11.0-SNAPSHOT +-com.holdenkarau:spark-testing-base_2.11:2.4.0_0.14.0 +-org.apache.hadoop:hadoop-minicluster:2.8.3 +-org.apache.hadoop:hadoop-yarn-server-tests:2.8.3 +-org.apache.hadoop:hadoop-yarn-api:2.8.3 and +-org.apache.pinot:pinot-spark:0.11.0-SNAPSHOT +-com.holdenkarau:spark-testing-base_2.11:2.4.0_0.14.0 +-org.apache.hadoop:hadoop-minicluster:2.8.3 +-org.apache.hadoop:hadoop-yarn-api:2.8.3 and +-org.apache.pinot:pinot-spark:0.11.0-SNAPSHOT +-org.apache.hadoop:hadoop-yarn-api:2.10.1 [WARNING] Rule 0: org.apache.maven.plugins.enforcer.DependencyConvergence failed with message: Failed while enforcing releasability. See above detailed error message.```
@ken: Only change I see to jar versions is on April 12th, by PJ Fanning, where the Hadoop version was bumped to 2.10.1
@g.kishore: not sure, how it passed the CI
@ken: Yes, exactly - so maybe my setup is borked? Now I’m wading through dependency graphs :disappointed:
@g.kishore: do you have the PR?
@ken: No, I was working on a different issue, so wanted to start fresh from current Pinot master, but that build failed
@ken: I’ll dig a bit more.
@ken: Looks like modifying the pom to exclude spark-yarn from spark-testing-base is sufficient. But I’m wondering why CI/original PR didn’t fail with the same issue.
@hareesh.lakshminaraya: @hareesh.lakshminaraya has joined the channel

#announcements

@gaetanmorlet: @gaetanmorlet has joined the channel

#getting-started

@dangngoctan2012: @dangngoctan2012 has joined the channel
@priya.shivakumar: @priya.shivakumar has joined the channel
@gunnar.enserro: hey! my team an I are researching how to implement ML and analytics into our pipeline! It could end up being a bottleneck... what would be goods ideas for scaling, placement, and formatting Apache pinot for ML tasks?
@mayanks: Would like to understand the requirement a bit more. What do these ML tasks do, and how are you planning to use Apache Pinot there?
@arnaud.zdziobeck: @arnaud.zdziobeck has joined the channel
@teehan: @teehan has joined the channel
@gaetanmorlet: @gaetanmorlet has joined the channel
@matthew: @matthew has joined the channel
@tommaso.peresson: @tommaso.peresson has joined the channel
@cesaro.angelo: @cesaro.angelo has joined the channel
@ghita.saouir: @ghita.saouir has joined the channel
@m.ram3sh: @m.ram3sh has joined the channel
@sonam.dp42: @sonam.dp42 has joined the channel

#pinot-docsrus

@steotia: can I get help in approving this ?
@steotia: I don't seem to have write access on this repo
@jlli: just did
@steotia: thank you
@sonam.dp42: @sonam.dp42 has joined the channel
@sonam.dp42: Hi, I've just send out a PR for an Explain plan doc update: The doc update is based on changes made in this PR: can someone take a look. cc @steotia

#introductions

@dangngoctan2012: @dangngoctan2012 has joined the channel
@priya.shivakumar: @priya.shivakumar has joined the channel
@arnaud.zdziobeck: @arnaud.zdziobeck has joined the channel
@teehan: @teehan has joined the channel
@gaetanmorlet: @gaetanmorlet has joined the channel
@matthew: @matthew has joined the channel
@tommaso.peresson: @tommaso.peresson has joined the channel
@cesaro.angelo: @cesaro.angelo has joined the channel
@ghita.saouir: @ghita.saouir has joined the channel
@m.ram3sh: @m.ram3sh has joined the channel
@sonam.dp42: @sonam.dp42 has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org

Apache Pinot Daily Email Digest (2022-05-25)

#general

#random

#feat-presto-connector

#pinot-power-bi

#troubleshooting

#pinot-dev

#announcements

#getting-started

#pinot-docsrus

#introductions

Reply via email to