#general


@nicholas.yu2: hello friends, i’m looking for information around running spark batch ingestion jobs using AWS EMR. thanks
  @mayanks: Hi Nicholas, please check out:
@bsa0393: @bsa0393 has joined the channel
@karinwolok1: :pencil2: Vote on features and improvements you'd like to see in Apache Pinot 2022! :pencil2: :spiral_calendar_pad: And join us Monday for the 2021 recap and future roadmap discussion! :call_me_hand:
@karinwolok1: Hey all! :mega: *Kafka Summit London is looking for speakers!* :mega: Interested in speaking? You have until Dec 20 to submit, so send in your talk now!!! :partying_face:
@ethan.wayne: @ethan.wayne has joined the channel
@brooks85.ty: @brooks85.ty has joined the channel
@vmahrwald: @vmahrwald has joined the channel
@brooks85.ty: In the docs, there are references to the “Filesystem backend” and “Deep Storage”… are those meant to be conceptually synonymous?
  @mayanks: In regards to persistent storage that is attached to the controller, yes
@cprokopiak: @cprokopiak has joined the channel
@navina: @navina has joined the channel
@ljurukov: @ljurukov has joined the channel
@chnzhoujun: @chnzhoujun has joined the channel

#random


@bsa0393: @bsa0393 has joined the channel
@ethan.wayne: @ethan.wayne has joined the channel
@brooks85.ty: @brooks85.ty has joined the channel
@vmahrwald: @vmahrwald has joined the channel
@cprokopiak: @cprokopiak has joined the channel
@navina: @navina has joined the channel
@ljurukov: @ljurukov has joined the channel
@chnzhoujun: @chnzhoujun has joined the channel

#troubleshooting


@bsa0393: @bsa0393 has joined the channel
@alihaydar.atil: Hi everyone, is there a character limit for STRING data type? it seems like value truncated to the first 512 characters. is there anyway to configure the string length?
  @msoni6226: You can configure the maxLength for string, please refer the link attached below:
  @alihaydar.atil: @msoni6226 thank you
@tiger: It looks like the MAX function truncates large longs and loses precision. For example, `select MAX(1639054811930692679) from table` returns `1.63905481193069261E18` . Is this behavior expected?
  @jadami: currently it is. all of these scalar aggregation functions cast to double. i’ll let someone else chime in on what the plan is for the future here, though
  @tiger: Would it make sense to keep the original type instead?
  @g.kishore: it does. please file an issue also add it to
  @tiger: Submitted an issue:
@ethan.wayne: @ethan.wayne has joined the channel
@brooks85.ty: @brooks85.ty has joined the channel
@bagi.priyank: how do i remove the dead controller entries?
  @mayanks: Do they appear in the ZK browser as well, or just in the UI? If latter, then it is a UI issue can you file a GH issue, and I'll followup on that?
  @bagi.priyank: in zk as well
  @npawar: did you try the drop instance API?
  @bagi.priyank: i did not. didn't know about it. let me try that.
  @bagi.priyank: that worked great. thank you!
@vmahrwald: @vmahrwald has joined the channel
@bagi.priyank: not really a troubleshooting question. `The deep store stores a compressed version of the segment files and it typically won't include any indexes.` will index always be in memory? is index re-computed when a server loads a segment from the deep store? is there a way to view size of the index?
  @mark.needham: I don't know that it will always be in memory, but the servers have a local copy of segments + indexes for the segments assigned to them.
  @mark.needham: Have a look at the end of this post to see how to view segments locally -
  @mark.needham: I don't yet know exactly how to work out the size of individual indexes because that data is stored inside the `columns.psf` file, which contains of everything for all the columns in a segment
  @mark.needham: @mayanks might know if there's a tool that can break it down
  @mayanks: If the compressed segment was generated by realtime server (after committing the segment), it does have the index.
  @mayanks: If it was generated by offline job, it is optional to generate the index there, and when servers load, they will create any missing inverted index.
  @mayanks: There's a rest-api in swagger to get table size which includes size with index
  @bagi.priyank: what about memory v/s disc for the startree index in particular with respect to a realtime server? i will check out the swagger api. thanks for all the info guys!
  @bagi.priyank: and if the index is included in the segment, does it mean that only the consuming segment is in memory, and the rest is on the disk?
  @mark.needham: @bagi.priyank have a look at the blog post for which endpoint to call on the Swagger API
  @bagi.priyank: i did (via swagger) and also looked at the blog post... ```{ "tableName": "km_mp_play_startree", "reportedSizeInBytes": -1, "estimatedSizeInBytes": -1, "offlineSegments": null, "realtimeSegments": { "reportedSizeInBytes": -1, "estimatedSizeInBytes": -1, "missingSegments": 1413, "segments": { "km_mp_play_startree__67__4__20211209T2209Z": { "reportedSizeInBytes": -1, "estimatedSizeInBytes": -1, "serverInfo": { "Server_10.220.9.195_8098": { "segmentName": "km_mp_play_startree__67__4__20211209T2209Z", "diskSizeInBytes": -1 } } },``` all 1413 segments have `reportedSizeInBytes` , `estimatedSizeInBytes` and `diskSizeInBytes` as -1
  @bagi.priyank: possibly because of connection timeouts. how can i configure higher timeouts? ```2021/12/09 23:20:42.380 ERROR [CompletionServiceHelper] [grizzly-http-server-4] Connection error java.util.concurrent.ExecutionException: org.apache.commons.httpclient.ConnectTimeoutException: The host did not accept the connection within timeout of 30000 ms at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:?] at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:?] at org.apache.pinot.controller.util.CompletionServiceHelper.doMultiGetRequest(CompletionServiceHelper.java:79) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.apache.pinot.controller.api.resources.ServerTableSizeReader.getSegmentSizeInfoFromServers(ServerTableSizeReader.java:69) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.apache.pinot.controller.util.TableSizeReader.getTableSubtypeSize(TableSizeReader.java:181) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.apache.pinot.controller.util.TableSizeReader.getTableSizeDetails(TableSizeReader.java:101) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.apache.pinot.controller.api.resources.TableSize.getTableSize(TableSize.java:83) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?] at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?] at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?] at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?] at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:124) [pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:167) [pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:219) [pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:79) [pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:469) [pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:391) [pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:80) [pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:253) [pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248) [pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244) [pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.glassfish.jersey.internal.Errors.process(Errors.java:292) [pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.glassfish.jersey.internal.Errors.process(Errors.java:274) [pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.glassfish.jersey.internal.Errors.process(Errors.java:244) [pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265) [pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:232) [pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:679) [pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.glassfish.jersey.grizzly2.httpserver.GrizzlyHttpContainer.service(GrizzlyHttpContainer.java:353) [pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.glassfish.grizzly.http.server.HttpHandler$1.run(HttpHandler.java:200) [pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:569) [pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:549) [pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at java.lang.Thread.run(Thread.java:829) [?:?] Caused by: org.apache.commons.httpclient.ConnectTimeoutException: The host did not accept the connection within timeout of 30000 ms at org.apache.commons.httpclient.protocol.ReflectionSocketFactory.createSocket(ReflectionSocketFactory.java:155) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:125) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1361) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.apache.pinot.common.http.MultiGetRequest.lambda$execute$0(MultiGetRequest.java:106) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?] at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?] ... 1 more Caused by: java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) ~[?:?] at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399) ~[?:?] at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242) ~[?:?] at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224) ~[?:?] at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:?] at java.net.Socket.connect(Socket.java:609) ~[?:?] at jdk.internal.reflect.GeneratedMethodAccessor40.invoke(Unknown Source) ~[?:?] at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?] at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?] at org.apache.commons.httpclient.protocol.ReflectionSocketFactory.createSocket(ReflectionSocketFactory.java:140) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:125) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1361) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.apache.pinot.common.http.MultiGetRequest.lambda$execute$0(MultiGetRequest.java:106) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?] at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?] ... 1 more```
  @bagi.priyank: and i am still not clear on if index is completely in memory or on the disk.
  @mayanks: The -1 is for consuming segments typically
  @mayanks: You shouldn't get -1 for a committed segment.
  @bagi.priyank: because of connect timeouts in controller logs?
  @mayanks: perhaps
  @mayanks: For RT Segments are either `consuming` - in memory index, not committed to disk
  @mayanks: or `committed` - committed to disk/storage.
  @mayanks: Segments when on disk are memory-mapped, so pages are pulled in/out as needed.
  @bagi.priyank: i see - so index for consuming segments is in memory, and based on a query index for a committed segment can be pulled in/out of page? and is there a way to configure the connection timeout for these api calls?
  @mayanks: yes
  @mayanks: which api call?
  @bagi.priyank: the one to get table size...or for that matter other calls between controller / broker / server?
  @bagi.priyank: i am guessing table size specifically is between controller and server?
  @mayanks: Ok, for the table size api, I am not sure if there's a timeout config.
  @mayanks: What is your high level goal here?
  @bagi.priyank: just to be able to monitor i guess. i was planning to play with different fields in the index and see how it affects table size and queries. one thing we are thinking of doing is integrating with looker so users can select different parameters for the query etc. also to help us estimate resources needed to host the cluster.
  @mayanks: ok. you can log on to the server and look at the segments in the data dir
  @mayanks: the table size api should be super fast btw, not sure why you are hitting timeout
  @bagi.priyank: i can look into that tomorrow. but honestly thanks a ton to you guys! there were a lot of things that i was not clear about. today was great. hats off to your patience dealing with me.
@cprokopiak: @cprokopiak has joined the channel
@navina: @navina has joined the channel
@ljurukov: @ljurukov has joined the channel
@chnzhoujun: @chnzhoujun has joined the channel

#pinot-dev


@jeff.moszuti: @jeff.moszuti has joined the channel

#getting-started


@msoni6226: @msoni6226 has joined the channel
@jeff.moszuti: @jeff.moszuti has joined the channel
@luisfernandez: does anyone have any best practice advice when it comes to maintain changes to schemas/tables like version control and what not
  @luisfernandez: ideally i would like to have something setup where all these changes are version controlled and then they are deployed into pinot by something else rather than us
  @g.kishore: we used to have the configs checked into github and have a script that would apply the changes

#udf-type-matching


@jadami: @jadami has joined the channel
@hristo: @hristo has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org

Reply via email to