Apache Pinot Daily Email Digest (2021-04-14)

2021-04-14 Thread Pinot Slack Email Digest
#general@fx19880617: Hello Community,

We’re happy to announce the release of :wine_glass: Apache Pinot 0.7.1!

This release includes several awesome new features  :page_with_curl: :earth_americas: :unlock: :
```- JSON index
- Lookup-based join support
- Geospatial support
- TLS support for pinot connections
- Introduced new APIs for segment management and offline table push.
- Various performance optimizations, improvements and bug fixes.```
Please also see the full release notes here: 
The release can be downloaded at 

Additional resources -
Project website: 
Getting started: 
Pinot developer blogs: 
Intro to Pinot Video: 
Twitter: 
Meetup:   @vananth22: ```Lookup-based join support```
is the game changer. Thanks for adding it!!!  @mailtobuchi: Great features. Would love to take the `Lookup join feature` for a toss.@havjyan: @havjyan has joined the channel@gabuglc: Hey guys, What is the correct way to add a table/schema from kafka via UI?@gabuglc:   @mayanks: The error seems to suggest that the table config is missing the schema name?  @gabuglc: Yes, im creating the schema and the table at the same time. Table is on the left, Schema is on the right  @mayanks: I mean there is suppoeds to be a schema field in the table config JSON that refers to the name of the schema in the right  @gabuglc: Isn't it schemaName on the table conf?  @mayanks: Ah yes, didn't catch it the first time  @mayanks: can you upload schema first and then create table?  @jackie.jxt: FYI, the `schemaName` is not mandatory. By default the table will link to the schema with the same name  @jackie.jxt: @npawar Can you please take a look and see if it is a bug?  @npawar: this is a very old version, i dont know how this is supposed to behave  @npawar: can you upgrade?  @gabuglc: I'm using 0.6.0.  @gabuglc: And I only got these options  @npawar: can you use latest tag?  @gabuglc: just updated. ty  @mayanks: @gabuglc Did that solve the issue?  @gabuglc: Yes, thanks alot@aaron: Is there anything I can do to make batch import faster? It seems like most of the time is spent processing the Parquet files I'm importing, but I still don't see very high CPU usage on my machine (particularly, most cores are not busy). I see stuff like this in the logs:
```Apr 14, 2021 3:16:33 PM INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: time spent so far 0% reading (1854 ms) and 99% processing (311813 ms)```
Is there a setting to use more cores to process segments in parallel or anything like that?  @dlavoie: What about your disk IO?  @aaron: Looking at some system stats, Disk I/O seems really low: Writes on the order of 100 MB/sec, reads on the order of 8 MB/sec  @dlavoie: what kind of disks are we talking? to some extend, 100MB/sec could be a bottleneck  @aaron: Looking into that now! Good call  @aaron: This is an NVMe under a virtualization layer@zhong.chen: @zhong.chen has joined the channel#random@havjyan: @havjyan has joined the channel@zhong.chen: @zhong.chen has joined the channel#troubleshooting@phuchdh: Hello guys, i’m have some issue with RealtimeToOfflineSegments task.
So i create 2 hybrids table from same days. 1 for QC env and 1 for UAT env.
• In the tables managements. It’s seem `RealtimetoOfflines` task in QC env has been stop. but i cannot find any errors log.
• Another the question is the realtime segments will be remove after convert to Offline Segments Table ?   @fx19880617: have you checked logs in minion?  @phuchdh: here is the logs of minion pods.  @phuchdh: sometime, my zookeepers pods rollout because vm preemptible in gcloud  @fx19880617: ic, maybe have 3 pinot-zookeepers for HA?  @phuchdh: i already setup 3 zookeepers for HA  @fx19880617: ok  @fx19880617: so if there is no task logs on minion, then it means the minion tasks are not scheduled  @fx19880617: can you check controller log and look for `RealtimetoOffline`  log?  @phuchdh: only 1 logs grep by “realtime”  @fx19880617: hmm, is this task scheduled? can you check minion apis through controller swagger ui?  @phuchdh: Minion apis is Task in swagger ?  @phuchdh: Could u answer question 2:
```Another the question is the realtime segments will be remove after convert to Offline Segments Table ?```
  @fx19880617: I don’t think so. This task requires a hybrid table and it will create segments and push to offline table. You can set a fairly low retention for realtime table but longer for offline table .  @fx19880617: @laxman: Found the root-cause. This is possibly due to a bug in groovy.

>From thread dumps we see all message handlers are slowly going to the following state and stuck there.
```"HelixTaskExecutor-message_handle_thread" #51 daemon prio=5 os_prio=0 cpu=70457.28ms elapsed=4885.80s tid=0x7fe4e43d6000 nid=0x6e waiting for monitor entry  [0x7fe4aa6e5000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at org.codehaus.groovy.reflection.ClassInfo$GlobalClassSet.add(ClassInfo.java:477)
	- waiting to lock <0x000702ed2218> (a 

[ANNOUNCE] Apache Pinot (incubating) 0.7.1 released

2021-04-14 Thread Fu Xiang
Hello Community,

We are pleased to announce that Apache Pinot (incubating) 0.7.1 is released!

Apache Pinot (incubating) is a distributed columnar storage engine that can
ingest data in real-time and serve analytical queries at low latency.

The release can be downloaded at https://pinot.apache.org/download

The release note is available at
https://docs.pinot.apache.org/basics/releases/0.7.1

Additional resources -
Project website: https://pinot.apache.org
Getting started: https://docs.pinot.apache.org/getting-started
Pinot developer blogs: https://medium.com/apache-pinot-developer-blog
Intro to Pinot Video: https://www.youtube.com/watch?v=T70jTTYhYyM

Join Pinot Community -
Twitter: https://twitter.com/ApachePinot
Meetup: https://www.meetup.com/apache-pinot/
Slack channel: https://communityinviter.com/apps/apache-pinot/apache-pinot


Best Regards,

Apache Pinot (incubating) Team