Apache Pinot Daily Email Digest (2022-04-04)

Pinot Slack Email Digest Mon, 04 Apr 2022 19:00:42 -0700

#general

@satyam.raj: @satyam.raj has joined the channel
@satyam.raj: Hi everyone! I’ve been doing POC on Pinot, and currently facing issue while ingestion orc file data to pinot. Filed an GH issue as well: Can anyone help?
@kharekartik: Hi can you add your schema and table config in the issue as well. Do remove the secret values.
@satyam.raj: Updated @kharekartik
@kharekartik: To me it seems like, the column names in your orc file and the column names in your schema file, do not match. They should be the same.
@satyam.raj: How can I get the exact column name from the orc file?
@kharekartik: `java -jar orc-tools-X.Y.Z-uber.jar meta your-file.orc` should print the schema
@satyam.raj: I guess the columns are named as `_col0, _col2` and so on
@kharekartik: `java -jar orc-tools-1.5.5-uber.jar meta 000000_0` this should work in your case
@kharekartik: Can you paste the metadata you got from command here?
@satyam.raj: ```➜ batchjob-spec java -jar orc-tools-1.5.5-uber.jar meta 000000_0 log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell). log4j:WARN Please initialize the log4j system properly. log4j:WARN See for more info. WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/Users/satyam.raj/dataplatform/pinot-dist/batchjob-spec/orc-tools-1.5.5-uber.jar) to method sun.security.krb5.Config.getInstance() WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release Processing data file 000000_0 [length: 8321467] Structure for 000000_0 File Version: 0.12 with HIVE_8732 Rows: 723010 Compression: ZLIB Compression size: 262144 Type: struct<_col0:string,_col1:string,_col2:string,_col3:string,_col4:string,_col5:string,_col6:string,_col7:string,_col8:string,_col9:string,_col10:string,_col11:string,_col12:string,_col13:int,_col14:int,_col15:int,_col16:string,_col17:string,_col18:date,_col19:date,_col20:date,_col21:string,_col22:string> Stripe Statistics: Stripe 1: Column 0: count: 723010 hasNull: false Column 1: count: 723010 hasNull: false min: 1000 max: 99999750 sum: 6114370 Column 2: count: 723010 hasNull: false min: customer max: customer sum: 5784080 Column 3: count: 723010 hasNull: false min: Birmingham max: wollongong sum: 2285843```
@kharekartik: Yep, then you will have to use same columnNames in schema. If you want the new column names, you can use `transformConfigs` in table config file
@satyam.raj: alright, thanks! one more question. what should i be using as datatype for the `date` fields in orc
@kharekartik: long
@satyam.raj: It worked :tada:
@goyal3593: @goyal3593 has joined the channel
@zaikhan: Does Pinot support using `pinot-jdbc-client` in JMeter and Perf testing queries performance of Pinot using jmeter’s JDBC Request Sampler?
@mayanks: Please use pinot-java-client for querying pinot for any perf testing.
@varun.j: @varun.j has joined the channel
@satyammast: @satyammast has joined the channel
@skondapalli: @skondapalli has joined the channel
@drojas: @drojas has joined the channel
@tonya: Hey folks! @dunithd is doing a virtual meetup tomorrow on analyzing IoT data with Apache Pinot and Kafka. Please register if you can join us! :speaking_head_in_silhouette: :computer:

#random

#troubleshooting

@jmeyer: Hello all ^^ I was looking into Slack history, trying to find an answer to my question - couldn't seem to find any so here I go We're doing standalone (apache/pinot Docker image in a WF) batch integrations - and we're seeing queries hitting Pinot before integrated data is available ("stale data") My use case is that we're doing data integration, firing off a Kafka event (after the `pinot-admin` step is finished), then querying Pinot, that's where we're seeing stale data Is there any way to • Have `./bin/pinot-admin.sh LaunchDataIngestionJob` wait for the data to be fully query-able ? • Have Pinot somehow notify when data becomes fully query-able ? NOTE: Job type is `SegmentCreationAndTarPush`
@mayanks: Is this for production or for testing? If for testing, you probably have some options: ```1. Wait for a fixed time (might be a bit brittle). 2. Wait for IS == EV (might need to write some checks for this).```
@jmeyer: It is for production
@mayanks: In production, what does it mean to be fully queryable? You will have data constantly being pushed right?
@jmeyer: We've got batch data that comes in, and some materialized view that needs to be updated by taking into account the newly integrated data
@mayanks: So you need atomic push? If so, @snlee added this feature?
@snlee: @mayanks the building blocks are there but we need to implement the client. Also, we support REFRESH only.
@jmeyer: > So you need atomic push? If so, @snlee added this feature? > Not quite Data -> Pinot job -> Kafka message saying ''new data available'' -> Other service queries service backed by Pinot (which is ''stale'' until it completes ingestion / indexing)
@snlee: @jmeyer If you have realtime table, you won’t have this staleness since the data will be updated in near realtime fashion whenever there’s new data gets ingested. You will have this issue when you have offline table only. We currently don’t provide a way to notify the new data being available. A generic way to provide the functionality would be that we provide the interface so that the user can provide the function that is executed after the offline ingestion. Feel free to file the issue on github for the feature request.
@jmeyer: Yes my use case is with offline table What you propose sounds like a good way to solve my issue, thanks @snlee, will do !
@satyam.raj: @satyam.raj has joined the channel
@goyal3593: @goyal3593 has joined the channel
@ysuo: Hi, team: ‘java.lang.IllegalArgumentException: must provide a password for admin’ error occurred when I use ‘./bin/pinot-admin.sh StartBroker -configFileName ./conf/pinot-broker-7011.conf’ to start Pinot broker. I have the same config as listed in the example. I have started Pinot controller according to the example config and it started successfully. Any idea what’s wrong the the broker config? pinot.broker.access.control.class=org.apache.pinot.broker.broker.BasicAuthAccessControlFactory pinot.broker.access.control.principals=admin,user pinot.broker.access.control.principals.admin.password=verysecret pinot.broker.access.control.principals.user.password=secret
@mayanks: From the code it seems like it is unable to find the password for `admin` in the property. However, your settings in the conf seem correct. Also, the exact same configs work in the integration test, so this is a bit confusing. My guess is there is something else going on in your setup that is causing the property to be not set?
@varun.j: @varun.j has joined the channel
@satyammast: @satyammast has joined the channel
@skondapalli: @skondapalli has joined the channel
@drojas: @drojas has joined the channel

#pinot-dev

@haitao: @haitao has joined the channel

#getting-started

@satyam.raj: @satyam.raj has joined the channel
@goyal3593: @goyal3593 has joined the channel
@varun.j: @varun.j has joined the channel
@satyammast: @satyammast has joined the channel
@skondapalli: @skondapalli has joined the channel
@drojas: @drojas has joined the channel
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org For additional commands, e-mail: dev-h...@pinot.apache.org