Apache Pinot Daily Email Digest (2021-01-26)

Pinot Slack Email Digest Tue, 26 Jan 2021 18:00:58 -0800

#general

@swaroop.aj: @swaroop.aj has joined the channel
@niels.it.berglund: Hey everyone! As @karinwolok1 asked what brought me here, I'd like to introduce myself. I am Niels, I am based in Durban, South Africa, and I work as Software Architect at Derivco. Been looking at Druid and the like for a while, and recently came across Pinot. I'm here to see what I can learn, and what Pinot can do for me, (the company).
@kohei: @kohei has joined the channel
@alfredk.j: @alfredk.j has joined the channel
@alfredk.j: Hi All ! I like to introduce myself. I'm Alfred and I'm working as a Senior Developer at UST in Singapore. I'm here to listen/learn more about Pinot.
@jjyliu: @jjyliu has joined the channel
@tymm: Hi, i am using addtable command to upload a new schema etc to docker pinot. how do i update table, eg add/ remove columns after i have added the table? Thanks
@g.kishore: You can use the cluster manager UI to edit schema
@mateus.oliveira: @mateus.oliveira has joined the channel
@leerho: @leerho has joined the channel
@leerho: Hello, this is Lee Rhodes of Apache DataSketches. I would be interested in some feedback as to how you are using our library … what sketches are you using, what you feel works well, and constructive feedback on what could work better, or what problems you would like sketches to address?
@mayanks: Hello @leerho. One of our count-distinct functions has a Theta-Sketch based implementation (we have an HLL based function as well). We like theta-sketches for its capability to perform set operations. However, our biggest challenge is getting accuracy under control (especially, in case of intersection of uneven sized sets). Anything you guys can do for that would be really helpful.
@leerho: Yes, accuracy of distinct counts of intersections (and differences) of sampled sets are difficult. And it is not a shortcoming of the algorithm, per se. It can be proven mathematically that no matter what streaming algorithm you use, if you end up with sampled sets of the original domain the accuracy of your estimate can be very poor compared with the accuracy of a union operation. We knew this from the outset. This is why we provide the getUpperBound() and getLowerBound() methods that you can use as a tool to warn you, after the fact, if the accuracy goes beyond what you consider to be acceptable. For example, with a theta sketch configured with K=4096 ( logK=12), its accuracy on a single stream or from a merge (union) will asymptote to about +/- 3.1 % with 95% confidence: (2 / sqrt(4096)) = .03125. What you can do: after any intersection or difference operation check to see how much the expected error has changed by computing `((getUpperBound(2) / getEstimate()) -1) * sqrt(K)/2`. This will be factor of how much your intersection error exceeds the nominal RSE of the sketch. If this results in a 2, that means your estimated error of that operation will be about twice as large or, in this case, about +/- 6.25% (at 95% confidence). At least this allows you to monitor the relative error of intersections and even be able to determine which operations caused the largest increase in error. You can also try scheduling the sequence of your set operations so that all of your intersections occur either early in the sequence or and the end. Depending on your data, you might find that reordering the sequence might help. Other than that, know that the intersection error of the theta sketches approaches the theoretical limit of what is possible, given a streaming algorithm and limited space. I hope this helps.
@mayanks: Yes, this is helpful. Thanks
@leerho: Do you have any interest in some of the other sketches: quantiles, frequent items, etc?
@mayanks: The interest is usually generated by Pinot users. Once we see our users asking for these, we are happy to add those into Pinot.
@leerho: We have found that very few system users are even aware that these capabilities exist. We would be glad to work with you to promote the possible leveraging of our DataSketches library to your users. There are lots of ways to do this.
@ken: Hi @leerho I haven’t spent any time seriously thinking about this, but I always wondered if there was a faster way to approximate LLR (log-likelihood ratio) using sketch-like methods (other than just using sketches for approximate counts). I’ve found LLR to be a very useful way to surface outliers in a dataset, but doing the exact computation (say, via map-reduce) can be painful.
@mayanks: We recently solved a audience matching use case at LinkedIn using Data Sketches impl in Pinot. We talked about it in one of our meetups, and I am in the process of publishing a Lnkd blog on the same.
@mayanks: Happy to collaborate
@leerho: I’ll have to do some research on LLR. Nonetheless, we have used both Frequent Items and Quantiles for finding outliers as well.
@leerho: We would be glad to help you with your blog and or meetups with materials, tutorials. Let us know how we can help.
@karinwolok1: And if you do anything like that, keep us posted - we'd be happy to cross publish / promote :slightly_smiling_face:
@leerho: We are actually preparing a Press-release with ASF about our recent graduation. It would be great if you folks could give us a couple of sentences of how useful DataSketches has been for Pinot!
@leerho: Something with the format: “QUOTE,” said NAME, TITLE at COMPANY. “…MORE…”
@apadhy: @apadhy has joined the channel
@shipengxie: @shipengxie has joined the channel
@jeff: @jeff has joined the channel

#random

@swaroop.aj: @swaroop.aj has joined the channel
@kohei: @kohei has joined the channel
@alfredk.j: @alfredk.j has joined the channel
@jjyliu: @jjyliu has joined the channel
@mateus.oliveira: @mateus.oliveira has joined the channel
@leerho: @leerho has joined the channel
@apadhy: @apadhy has joined the channel
@shipengxie: @shipengxie has joined the channel
@karinwolok1: Random question, @niels.it.berglund - are you related to Tim? Haha
@dlavoie: I had the same question but didn’t dared to ask :smile:
@karinwolok1: I mean, he said hello in the random group - and I did see him in the meetup on Thursday. Haha
@dlavoie: Coincidence? I think not :stuck_out_tongue:
@karinwolok1: life works in mysterious wayssss
@mayanks: Wow, I had the same question too
@karinwolok1: Maybe he's the Peyton to the Eli (Manning)
@karinwolok1: Niels! EVERYONE WANTS TO KNOW hahahaha
@jeff: @jeff has joined the channel

#troubleshooting

@elon.azoulay: I can't seem to select a virtual column from the pinot query console, is it supported? i.e. ```select $segmentName from <table> limit 10```
@g.kishore: @jackie.jxt ^^
@jackie.jxt: Are you running the latest version? What is the query response?
@elon.azoulay: It's just no rows, I'm running 0.5.0. iirc I remember doing this and it worked.
@elon.azoulay: empty response
@elon.azoulay: same for `$docId`
@elon.azoulay: but I'm using query console
@elon.azoulay: tried both pql and sql
@jackie.jxt: There was a fix for the virtual column, let me check
@jackie.jxt:
@jackie.jxt: Available in 0.6.0:joy:
@elon.azoulay: Ok, upgrading :rolling_on_the_floor_laughing: thanks:)
@swaroop.aj: @swaroop.aj has joined the channel
@kohei: @kohei has joined the channel
@alfredk.j: @alfredk.j has joined the channel
@jjyliu: @jjyliu has joined the channel
@mateus.oliveira: @mateus.oliveira has joined the channel
@ken: I’m trying to use the map-reduce job to build segments. In HadoopSegmentGenerationJobRunner.packPluginsToDistributedCache, there’s this code: ``` File pluginsTarGzFile = new File(PINOT_PLUGINS_TAR_GZ); try { TarGzCompressionUtils.createTarGzFile(pluginsRootDir, pluginsTarGzFile); } catch (IOException e) { LOGGER.error("Failed to tar plugins directory", e); throw new RuntimeException(e); } job.addCacheArchive(pluginsTarGzFile.toURI());``` This creates a `pinot-plugins.tar.gz` file in the Flink distribution directory, which is on my server. But as the Hadoop DistributedCache documentation states, “The `DistributedCache` assumes that the files specified via urls are already present on the `FileSystem` at the path specified by the url and are accessible by every machine in the cluster.”
@ken: So what you get is this error: `.FileNotFoundException: File file:/path/to/distribution/apache-pinot-incubating-0.7.0-SNAPSHOT-bin/pinot-plugins.tar.gz does not exist`
@ken: I think the job needs to use the staging directory (in HDFS) for this file (and any others going into the distributed cache).
@g.kishore: what is the fix?
@ken: I think the tar file (in snippet above) should be generated in a temp dir, and then uploaded to the staging directory. and the staging dir URI is what’s added to the distributed cache
@ken: I think this might only be an error path through the code when a plugins dir is explicitly provided…trying without it now
@g.kishore: what do you mean by upload to staginging directory
@g.kishore: I thought the addacheArchirve code is getting executed on the gateway node
@ken: As part of the job spec file, you include a `stagingDir` configuration.
@g.kishore: so stagingDir should be on HDFS?
@ken: And yes, the addCacheArchive() gets called on the server where you start the job. Which is why it has to be provided a URI to a file that’s available on every slave server. So it can’t be a path.
@g.kishore: we thought that happens automatically
@ken: And yes, stagingDir should be on HDFS (when running distributed). And if you don’t specify it as such, the job fails (as it should) because it’s not using the same file system as the input/output directories.
@ken: From the DistributedCache JavaDocs: “Applications specify the files, via urls (hdfs:// or http://) to be cached via the `JobConf"`
@g.kishore: got it! is this how it was from day one?
@ken: It will work if you run locally, of course, because the is accessible to the mappers
@ken: Or if every server has the shared drive mounted that contains the Flink distribution
@ken: Those are the only reasons why I think it could work as-is now
@ken: Maybe @fx19880617 has some insights, I think he wrote this code. I could be reading it wrong, of course…
@g.kishore: what you are saying makes sense, but I thought job launcher pushes this to worker nodes
@g.kishore: looks like its more of a pull from the worker node
@ken: It’s a bit confusing…if you use the standard Hadoop command line `-files` parameter (as an example), then the standard Hadoop tool framework will copy the file(s) to HDFS first, before adding them to the JobConf as `` paths. In the Pinot code, you need to do this first step (of copying to HDFS) yourself.
@ken: And then the Hadoop slaves will take care of copying these cache files from HDFS to a local directory (that part you don’t have to do anything special for)
@g.kishore: > then the standard Hadoop tool framework will copy the file(s) to HDFS first that what I thought would happen when we do it via code, do you know which staging directory will it copy it to?
@ken: Each Hadoop job has a “staging” directory in the cluster
@ken: There’s a job-specific directory inside of that, where the archives (jar files), etc get copied
@ken: Taking off for a bit, I might file a PR for this
@g.kishore: thanks
@fx19880617: Thanks @ken
@ken: I just filed , looking at a fix now.
@fx19880617: Thanks!
@fx19880617: I made a change:
@fx19880617: in the branch
@fx19880617: can you help validate if this one works and you can submit a PR for fixing it!
@ken: Funny, looks very similar to what I’ve done: ``` protected void packPluginsToDistributedCache(Job job, PinotFS outputDirFS, URI stagingDirURI) { File pluginsRootDir = new File(PluginManager.get().getPluginsRootDir()); if (pluginsRootDir.exists()) { try { File pluginsTarGzFile = File.createTempFile("pinot-plugins", ".tar.gz"); TarGzCompressionUtils.createTarGzFile(pluginsRootDir, pluginsTarGzFile); // Copy to staging directory Path cachedPluginsTarball = new Path(stagingDirURI.toString(), SegmentGenerationUtils.PINOT_PLUGINS_TAR_GZ); outputDirFS.copyFromLocalFile(pluginsTarGzFile, cachedPluginsTarball.toUri()); job.addCacheArchive(cachedPluginsTarball.toUri());```
@ken: Working on a way to unit test…
@fx19880617: :thumbsup:
@ken: I’ve also got a change to `addDepsJarToDistributedCache`, which has the same issue
@ken: I’m hoping to try it out tonight. brb
@leerho: @leerho has joined the channel
@apadhy: @apadhy has joined the channel
@shipengxie: @shipengxie has joined the channel
@pabraham.usa: All Pinot Server pods keeps crashing with following error. Anyone came across this before? ```[Times: user=0.02 sys=0.00, real=0.00 secs] # # A fatal error has been detected by the Java Runtime Environment: # # SIGBUS (0x7) at pc=0x00007f104649b6ff, pid=1, tid=0x00007ee665d06700 # # JRE version: OpenJDK Runtime Environment (8.0_282-b08) (build 1.8.0_282-b08) # Java VM: OpenJDK 64-Bit Server VM (25.282-b08 mixed mode linux-amd64 compressed oops) # Problematic frame: # C [libc.so.6+0x15c6ff] # # Core dump written. Default location: /opt/pinot/core or core.1 # # An error report file with more information is saved as: # /opt/pinot/hs_err_pid1.log```
@dlavoie: Can you share some details about your `Xmx` and available off heap setup?
@pabraham.usa: @dlavoie ```jvmOpts: "-Xms512M -Xmx4G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:/dev/stdout -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=1 "```
@dlavoie: How much ram is ram is available to that box?
@pabraham.usa: Also seeing some WARNs ```my-pinot-controller-0 controller 2021/01/26 22:17:09.409 WARN [CallbackHandler] [ZkClient-EventThread-29-my-pinot-zk-cp-zookeeper.logging.svc.cluster.local:2181] Callback handler received event in wrong order. Listener: org.apache.helix.controller.GenericHelixController@362617cf, path: /pinot-quickstart/INSTANCES/Server_my-pinot my-pinot-controller-0 controller 2021/01/26 22:17:12.039 WARN [ZkBaseDataAccessor] [HelixController-pipeline-task-pinot-quickstart-(83ee1db3_TASK)] Fail to read record for paths: {/pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/51a431e9-c0e1-4e08-9bcb-aee747608526=-101, /pinot-quickstart/INSTANCES/Server_my-pinot-server-0.my-pinot-server-headless.logging.svc.cluster.local_8098/MESSAGES/f9005198-fe85-4cad-84e8-0df918de95d9=-101```
@pabraham.usa: The box has 26G RAM
@dlavoie: Server requires 50% heap and 50% non heap
@dlavoie: Depending on the segments you have within a server, it’s important to have memory boundaries.
@dlavoie: Is it a multi-tenant box?
@pabraham.usa: sort of k8s single tenant , as I restricted with memory boundaries. 26G is fully available to Pinot.
@dlavoie: ok, then make sure you have k8s memory request x2 your jvm Xmx
@dlavoie: that’s a rule of thumb
@dlavoie: for servers of course
@dlavoie: Controller and Broker may configure a Xmx that is nearly maxing the k8s memory request
@pabraham.usa: I increased the xmx to 8G now, the mem req is 26G
@dlavoie: That’s not what i meant
@dlavoie: if you have a 4G XMX, ensure you configure a `resources.request.memory: 8Gi` to your pod.
@dlavoie: By default, if there’s no limit, the pod will think it can use up to 26G of non heap
@dlavoie: Until the container runtime says no, no, no
@pabraham.usa: That can happen even if it is 8G right?
@dlavoie: You need to think about 2 memory configuration.
@dlavoie: The pod is a “VM”, the JVM is running inside it. When working with offheap, the jvm will ask the OS, how much off heap can I use?. If the pod is configured without memory limit. It will tell the JVM that 26G is available.
@dlavoie: That 26G will not be reserved
@dlavoie: because others pods will also think they can use that.
@dlavoie: so, having a pod with a hard 8G limit, will garantee that the JVM will not go over the fence.
@pabraham.usa: ahh ok, I am actually using AWS node with 122GB of RAM and 26 GB is mem request
@pabraham.usa: 26GB mem request for single Pinot server
@dlavoie: Ok!
@dlavoie: then you can even bump it to 12GB xmx if the server pod has 26Gi request
@pabraham.usa: Thanks @dlavoie, I increased the xmx from 4G to 8G and servers are up
@pabraham.usa: I can use 12G
@dlavoie: the rule of thump is ~50%
@pabraham.usa: ok, How abt -Xms12G -Xmx12G
@dlavoie: matching Xms is always a good thing in a container environment. that memory is wasted anyway if it is request by the pod
@pabraham.usa: great Thanks for the help here
@pabraham.usa: All the segments are in bad status and search not working. So I have to restore from backup. If the issue was OOM then I assume it might have caused some rouge segments. Now finding those and changing the offset in zookeeper to skip those will be hard..!!!
@dlavoie: Are segments being reloaded?
@dlavoie: They will remains in bad state until everything is reloaded.
@pabraham.usa: Ohh triggered a reload now , see how it goes
@dlavoie: watch your CPU and Disk IO, that’s a good tell of what’s happening
@pabraham.usa: yes both spiked, especially disk
@pabraham.usa: and now all segments came back to good and Pinot is trying to catchup with stream.
@pabraham.usa: catching up very slowly though
@dlavoie: _cries in 5 hour segments reloads_
@jeff: @jeff has joined the channel

#pinot-docs

@ken: I had a few issues/questions about the batch data ingestion documentation, I’m guessing mostly looking for input from @fx19880617
@ken: When using the example job spec for Hadoop, this line caused a problem: `# 'glob:**\/*.avro' will include all the avro files under inputDirURI recursively.` Looks like even though it’s commented out, the parser complains about the `\` character, as in ```SimpleTemplateScript1.groovy: 37: unexpected char: '\' @ line 37, column 13.```
@fx19880617: ah, yes, please delete that
@ken: Also I think the last section (currently `Tunning`, should be `Tuning`) only applies to running a batch job locally, as setting `JAVA_OPTS` doesn’t impact Hadoop jobs, right?
@fx19880617: I feel it could be issue about the groovy lib version
@ken: Right - I guess I could file an issue about that
@fx19880617: Thanks!
@fx19880617: for hadoop, there is a way to set slave executor size
@fx19880617: I will make the changes
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

Apache Pinot Daily Email Digest (2021-01-26)

#general

#random

#troubleshooting

#pinot-docs

Reply via email to