#general


@humengyuk18: Does Pinot support change schema existing column name? I tried change a column name, but got following exceptions on query: ```[ { "errorCode": 500, "message": "MergeResponseError:\nData schema mismatch between merged block: [time_to_hour(LONG),age_decade(STRING),age_level(STRING),city(STRING),company_id(STRING),company_name(STRING),count_impression(LONG),count_in(LONG),count_passby(LONG),create_time(LONG),day(STRING),day_in_week(STRING),district(STRING),gate_id(STRING),gender(STRING),holiday_id(STRING),holiday_name(STRING),hour(STRING),is_holiday(STRING),month(STRING),province(STRING),region(STRING),shop_id(STRING),shop_name(STRING),temperature(STRING),temperature_id(STRING),total_duration(LONG),total_impression_duration(LONG),weather_cate_id(STRING),weather_cate_name(STRING),year(STRING)] and block to merge: [time_to_hour(LONG),age_decade(STRING),age_level(STRING),city(STRING),company_id(STRING),company_name(STRING),count_impression(LONG),count_in(LONG),count_passby(LONG),create_time(LONG),day(STRING),day_in_week(STRING),district(STRING),gate_id(STRING),gender(STRING),holiday_id(STRING),holiday_name(STRING),hour(STRING),is_holiday(STRING),month(STRING),province(STRING),region(STRING),shop_id(STRING),shop_name(STRING),temperature(STRING),temperature_id(STRING),total_duration(LONG),total_impression_duraion(LONG),weather_cate_id(STRING),weather_cate_name(STRING),year(STRING)], drop block to merge" } ]```
  @mayanks: Hello, schema evolution is supported as long as it is backward compatible. Changing a column name or type is considered backward incompatible, and is not supported
  @humengyuk18: Thanks, so in this case, I should delete all the segment and re-ingest all the data?
  @mayanks: Yes, for incompatible schema change, that is the option
@pankaj: If we extend a table schema in Pinot to add new columns (so it does not break backward compatibility); do we have to backfill data or can Pinot use null/default values to handle the older segments?
  @mayanks: Pinot can auto fill null/default value in this case
  @npawar: Pinot can also fill derived value i.e. if the value of new column is derived from existing columns, Pinot will calculate it using the function you provide
@1705ayush: *How to ingest Data into pinot on kubernetes using native batch ingestion?* Hi, I am trying to ingest csv data into pinot deployed on kubernetes using LaunchDataIngestionJob arg. I have verified that the table has been created on pinot and the job-spec and csv data are present on the node. This is my job-spec file ```apiVersion: batch/v1 kind: Job metadata: name: pinot-case-offline-ingestion namespace: my-pinot-kube spec: template: spec: containers: - name: pinot-load-case-offline image: apachepinot/pinot:0.3.0-SNAPSHOT args: ["LaunchDataIngestionJob", "-jobSpecFile", "/opt/data/table-configs/case_history/job-spec.yml"] volumeMounts: - name: mount-data mountPath: /opt/data restartPolicy: OnFailure volumes: - name: mount-data hostPath: path: /opt/data backoffLimit: 100``` After applying this job to node, nothing happens and this is the log of the pod. ```SegmentGenerationJobSpec: !!org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec excludeFileNamePattern: null executionFrameworkSpec: {extraConfigs: null, name: standalone, segmentGenerationJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner, segmentTarPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner, segmentUriPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner} includeFileNamePattern: glob:**/*.csv inputDirURI: /opt/data/csv_data/case_prod_data jobType: SegmentCreationAndTarPush outputDirURI: /pinot-segments/case_history overwriteOutput: true pinotClusterSpecs: - {controllerURI: ''} pinotFSSpecs: - {className: org.apache.pinot.spi.filesystem.LocalPinotFS, configs: null, scheme: file} pushJobSpec: null recordReaderSpec: className: org.apache.pinot.plugin.inputformat.csv.CSVRecordReader configClassName: org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig configs: {delimiter: '|', multiValueDelimiter: ''} dataFormat: csv segmentNameGeneratorSpec: configs: {segment.name.prefix: case_history, exclude.sequence.id: 'true'} type: normalizedDate tableSpec: {schemaURI: null, tableConfigURI: null, tableName: case_history} Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner Initializing PinotFS for scheme file, classname org.apache.pinot.spi.filesystem.LocalPinotFS``` Am I ingesting the data incorrectly ?
  @fx19880617: I think you are missing pushJobSpec?
  @fx19880617: ```pushJobSpec: null```
  @1705ayush: Hi @fx19880617, Thank you for helping. I tried adding pushJobSpec to job-spec ```pushJobSpec: pushParallelism: 2 pushAttempts: 2 pushRetryIntervalMillis: 1000``` But the job gets completed with no errors. And the pod log is ```SegmentGenerationJobSpec: !!org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec excludeFileNamePattern: null executionFrameworkSpec: {extraConfigs: null, name: standalone, segmentGenerationJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner, segmentTarPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner, segmentUriPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner} includeFileNamePattern: glob:**/*.csv inputDirURI: /opt/data/csv_data/case_prod_data jobType: SegmentCreationAndTarPush outputDirURI: /pinot-segments/case_history overwriteOutput: true pinotClusterSpecs: - {controllerURI: ''} pinotFSSpecs: - {className: org.apache.pinot.spi.filesystem.LocalPinotFS, configs: null, scheme: file} pushJobSpec: {pushAttempts: 2, pushParallelism: 2, pushRetryIntervalMillis: 1000, segmentUriPrefix: null, segmentUriSuffix: null} recordReaderSpec: className: org.apache.pinot.plugin.inputformat.csv.CSVRecordReader configClassName: org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig configs: {delimiter: '|', multiValueDelimiter: ''} dataFormat: csv segmentNameGeneratorSpec: configs: {segment.name.prefix: case_history, exclude.sequence.id: 'true'} type: normalizedDate tableSpec: {schemaURI: null, tableConfigURI: null, tableName: case_history} Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner Initializing PinotFS for scheme file, classname org.apache.pinot.spi.filesystem.LocalPinotFS```
  @fx19880617: ok
  @fx19880617: what’s the logs for the job?
  @1705ayush: Here is the log of the job: ```16:26:48:ayush@:pinot :alien: kubectl -n my-pinot-kube describe jobs.batch pinot-case-offline-ingestion Name: pinot-case-offline-ingestion Namespace: my-pinot-kube Selector: controller-uid=25b4e843-b600-4de2-a2ad-584ac8ce17b5 Labels: controller-uid=25b4e843-b600-4de2-a2ad-584ac8ce17b5 job-name=pinot-case-offline-ingestion Annotations: <none> Parallelism: 1 Completions: 1 Start Time: Fri, 05 Mar 2021 16:26:41 -0500 Completed At: Fri, 05 Mar 2021 16:26:44 -0500 Duration: 3s Pods Statuses: 0 Running / 1 Succeeded / 0 Failed Pod Template: Labels: controller-uid=25b4e843-b600-4de2-a2ad-584ac8ce17b5 job-name=pinot-case-offline-ingestion Containers: pinot-load-case-offline: Image: apachepinot/pinot:0.3.0-SNAPSHOT Port: <none> Host Port: <none> Args: LaunchDataIngestionJob -jobSpecFile /opt/data/table-configs/case_history/job-spec.yml Environment: <none> Mounts: /opt/data from mount-data (rw) Volumes: mount-data: Type: HostPath (bare host directory volume) Path: /opt/data HostPathType: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulCreate 27s job-controller Created pod: pinot-case-offline-ingestion-mfvrx Normal Completed 24s job-controller Job completed``` The following is the job spec file to refer. What should be the pinotClusterSpecs.controllerURI value? I tried changing it to anything gibberish and I faced the same logs. I think, my value of pinotClusterSpecs.controllerURI is incorrect. ```executionFrameworkSpec: name: 'standalone' segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner' segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner' segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner' jobType: SegmentCreationAndTarPush inputDirURI: '/opt/data/csv_data/case_prod_data' includeFileNamePattern: 'glob:**/*.csv' outputDirURI: '/pinot-segments/case_history' overwriteOutput: true pinotFSSpecs: - scheme: file className: org.apache.pinot.spi.filesystem.LocalPinotFS recordReaderSpec: dataFormat: 'csv' className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader' configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig' configs: delimiter: '|' multiValueDelimiter: '' tableSpec: tableName: 'case_history' pinotClusterSpecs: # - controllerURI: 'pinot-controller:9000' - controllerURI: '' segmentNameGeneratorSpec: type: normalizedDate configs: segment.name.prefix: 'case_history' exclude.sequence.id: true pushJobSpec: pushParallelism: 2 pushAttempts: 2 pushRetryIntervalMillis: 1000```
  @fx19880617: then are there data on `/opt/data/csv_data/case_prod_data`
  @1705ayush: yes. I checked by running a ubuntu container and bashed into it. there is data present on this path
  @fx19880617: can you try a newer image as well
  @fx19880617: ```apachepinot/pinot:0.6.0```
  @fx19880617: 0.3.0 is very old image which I cannot recall the details
  @1705ayush: ok. so changed the image. it worked. at the very end of the log it says ```Response for pushing table case_history segment case_history to location - 200: {"status":"Successfully uploaded segment: case_history of table: case_history"}```
  @1705ayush: But, wondering why I cannot query it on the pinot query UI
  @1705ayush: there are no records returned from the query select * from case_history limit 10
  @fx19880617: hmm
  @fx19880617: it should be
  @1705ayush: seems, like another issue that I have to look into. But anyways, thank you very much @fx19880617 for you promt responses and help. The new image worked out well.
  @fx19880617: can you check pinot server log?
  @fx19880617: seems like so
  @1705ayush: ok. I do see some errors on pinot-server.
  @1705ayush: ```2021/03/05 20:45:00.943 INFO [HelixServerStarter] [Start a Pinot [SERVER]] Starting Pinot server 2021/03/05 20:45:00.944 INFO [HelixServerStarter] [Start a Pinot [SERVER]] Initializing Helix manager with zkAddress: pinot-zookeeper:2181, clusterName: pinot-quickstart, instanceId: Server_pinot-server-0.pinot-server-headless.my-pinot-kube.svc.cluster.local_8098 2021/03/05 20:45:02.560 INFO [HelixServerStarter] [Start a Pinot [SERVER]] Initializing server instance and registering state model factory 2021/03/05 20:45:51.252 INFO [HelixServerStarter] [Start a Pinot [SERVER]] Connecting Helix manager 2021/03/05 20:46:42.537 WARN [ClientCnxn] [Start a Pinot [SERVER]-SendThread(pinot-zookeeper:2181)] Client session timed out, have not heard from server in 31084ms for sessionid 0x0 2021/03/05 20:46:44.353 WARN [ParticipantHealthReportTask] [Start a Pinot [SERVER]] ParticipantHealthReportTimerTask already stopped 2021/03/05 20:47:10.343 WARN [CallbackHandler] [Start a Pinot [SERVER]] Callback handler received event in wrong order. Listener: org.apache.helix.messaging.handling.HelixTaskExecutor@2767bcd8, path: /pinot-quickstart/INSTANCES/Server_pinot-server-0.pinot-server-headless.my-pinot-kube.svc.cluster.local_8098/MESSAGES, expected types: [CALLBACK, FINALIZE] but was INIT 2021/03/05 20:47:11.245 INFO [HelixServerStarter] [Start a Pinot [SERVER]] Instance config for instance: Server_pinot-server-0.pinot-server-headless.my-pinot-kube.svc.cluster.local_8098 has instance tags: [DefaultTenant_OFFLINE, DefaultTenant_REALTIME], host: pinot-server-0.pinot-server-headless.my-pinot-kube.svc.cluster.local, port: 8098, no need to update 2021/03/05 20:47:11.249 INFO [HelixServerStarter] [Start a Pinot [SERVER]] Using class: org.apache.pinot.server.api.access.AllowAllAccessFactory as the AccessControlFactory 2021/03/05 20:47:11.455 INFO [HelixServerStarter] [Start a Pinot [SERVER]] Starting server admin application on: 2021/03/05 20:47:13.650 WARN [ClientCnxn] [Start a Pinot [SERVER]-SendThread(pinot-zookeeper:2181)] Session 0x10001285ff10004 for server pinot-zookeeper/10.107.87.233:2181, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_282] at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[?:1.8.0_282] at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:1.8.0_282] at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_282] at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) ~[?:1.8.0_282] at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:75) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-b2d716d9c465eaf69685f8e284015de5cd7b038e] at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:363) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-b2d716d9c465eaf69685f8e284015de5cd7b038e] at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223) [pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-b2d716d9c465eaf69685f8e284015de5cd7b038e] 2021/03/05 20:47:46.344 WARN [ZKHelixManager] [ZkClient-EventThread-16-pinot-zookeeper:2181] KeeperState:Disconnected, SessionId: 10001285ff10004, instance: Server_pinot-server-0.pinot-server-headless.my-pinot-kube.svc.cluster.local_8098, type: PARTICIPANT Mar 05, 2021 8:48:39 PM org.glassfish.grizzly.http.server.NetworkListener start INFO: Started listener bound to [0.0.0.0:8097] Mar 05, 2021 8:48:40 PM org.glassfish.grizzly.http.server.HttpServer start INFO: [HttpServer] Started. 2021/03/05 20:48:41.841 WARN [ZKHelixManager] [ZkClient-EventThread-16-pinot-zookeeper:2181] KeeperState:Disconnected, SessionId: 10001285ff10004, instance: Server_pinot-server-0.pinot-server-headless.my-pinot-kube.svc.cluster.local_8098, type: PARTICIPANT 2021/03/05 20:50:17.063 WARN [ZKHelixManager] [ZkClient-EventThread-16-pinot-zookeeper:2181] KeeperState:Disconnected, SessionId: 10001285ff10004, instance: Server_pinot-server-0.pinot-server-headless.my-pinot-kube.svc.cluster.local_8098, type: PARTICIPANT 2021/03/05 20:51:06.653 ERROR [StartServiceManagerCommand] [Start a Pinot [SERVER]] Failed to start a Pinot [SERVER] at 368.2 since launch org.apache.helix.HelixException: fail to set config. cluster: pinot-quickstart is NOT setup. at org.apache.helix.ConfigAccessor.set(ConfigAccessor.java:300) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-b2d716d9c465eaf69685f8e284015de5cd7b038e] at org.apache.helix.manager.zk.ZKHelixAdmin.setConfig(ZKHelixAdmin.java:1092) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-b2d716d9c465eaf69685f8e284015de5cd7b038e] at org.apache.pinot.server.starter.helix.HelixServerStarter.start(HelixServerStarter.java:361) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-b2d716d9c465eaf69685f8e284015de5cd7b038e] at org.apache.pinot.tools.service.PinotServiceManager.startServer(PinotServiceManager.java:150) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-b2d716d9c465eaf69685f8e284015de5cd7b038e] at org.apache.pinot.tools.service.PinotServiceManager.startRole(PinotServiceManager.java:95) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-b2d716d9c465eaf69685f8e284015de5cd7b038e] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand$1.lambda$run$0(StartServiceManagerCommand.java:260) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-b2d716d9c465eaf69685f8e284015de5cd7b038e] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startPinotService(StartServiceManagerCommand.java:286) [pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-b2d716d9c465eaf69685f8e284015de5cd7b038e] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.access$000(StartServiceManagerCommand.java:57) [pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-b2d716d9c465eaf69685f8e284015de5cd7b038e] at org.apache.pinot.tools.admin.command.StartServiceManagerCommand$1.run(StartServiceManagerCommand.java:260) [pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-b2d716d9c465eaf69685f8e284015de5cd7b038e] 2021/03/05 21:37:47.170 WARN [ConfigAccessor] [ZkClient-EventThread-16-pinot-zookeeper:2181] No config found at /pinot-quickstart/CONFIGS/RESOURCE/case_history_OFFLINE```
  @1705ayush: I dont know why it is looking for pinot-quickstart configs
  @fx19880617: hmm when you start pinot server did you give a clustername?
  @1705ayush: I start pinot using helm like ```kubectl create ns my-pinot-kube helm install pinot /home/ayush/spyne/incubator-pinot/kubernetes/helm/pinot -n my-pinot-kube --set replicas=1```
  @fx19880617: hmmm
  @fx19880617: can you describe the statefulset of pinot-controller and pinot-server and see what's the arguments for that
  @1705ayush: ok. All the pinot workers are in running state. I do see these 2 errors on pinot-controller ```WARN [PinotInstanceRestletResource] [grizzly-http-server-1] Admin port is not set for instance: Server_pinot-server-0.pinot-server-headless.my-pinot-kube.svc.cluster.local_8098 ... ...``` ```WARN [PinotInstanceRestletResource] [grizzly-http-server-1] Grpc port is not set for instance: Controller_pinot-controller-0.pinot-controller-headless.my-pinot-kube.svc.cluster.local_9000 ... ...```
  @1705ayush: or, I think this could mean something (log on pinot-controller) ```WARN [SegmentStatusChecker] [pool-7-thread-2] Table case_history_OFFLINE has 1 segments with no online replicas WARN [SegmentStatusChecker] [pool-7-thread-2] Table case_history_OFFLINE has 0 replicas, below replication threshold :1```
  @fx19880617: this means your controller is up, but no pinot server is connected to the cluster
  @fx19880617: i feel something goes wrong with the server setup
  @fx19880617: can you try to restart pinot-server pod and see if it's reconnecting?
  @1705ayush: yes. restarting the node
  @1705ayush: yes. restarting the node worked out! Thank you very much @fx19880617. :pray:
  @fx19880617: cool!
  @fx19880617: I think the issue is that pinot server pod started before pinot controller which requires setup the zookeeper structure
  @fx19880617: so restart should fix it
  @1705ayush: yes. whenever I start using helm, zookeeper and controller are the last ones to start and because of that server and broker takes multiple restarts.

#segment-write-api


@npawar: Hey @yupeng , here's the branch i'm working on:
@npawar: i have a basic no-frills file based impl in there. everything is sync and single threaded at the moment.
@npawar: But should be good enough if you want to start trying it out in your flink connector POC
@npawar: if you do try it, lmk if you have any feedback
@yupeng: thanks!
@yupeng: that’s fast
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

Reply via email to