<h3><u>#general</u></h3><br><strong>@somanshu.jindal: </strong>@somanshu.jindal
has joined the channel<br><strong>@somanshu.jindal: </strong>Hi all, I was
trying realtime ingestion in pinot following the docs.
<https://u17000708.ct.sendgrid.net/ls/click?upn=1BiFF0-2FtVRazUn1cLzaiMdTeAXadp8BL3QinSdRtJdqF7hckgVpJ77N6aIHLFxaXdh8R-2FkcEA4nQ11ltv-2BHIwbrzcyzAWB-2FXjfYIvyo3q0eRZGiLxRWgQRBEMyeB-2FaiIj5v9mTaEZwdJoWK221JX4Q-3D-3DNkyB_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTx5dmck3M6r8v4JDXYyYiCeW67ym-2BIAwcGPWd1wyGN7Ea3Yzf8bkrMyke-2BYhmP9MvcIojmZ3LEIYjm1MBlncCD-2FINf5B51keWh1NfmJRtG9AfZsChBNT2EsubWBpEBwXb1Ltm4l-2FGKAR-2Fl3yn0BnAh4FZ1mExwSIwkkBjb9EMskuSxewr7QeuxF3M4eVfG1rys-3D>
In the query console i am unable to query timestamp field and getting errors.
Any idea why is this happening?<br><strong>@npawar: </strong>I have changed
column name to “timstampInEpoch” in the Getting started pages, so that folks
don’t hit this error again
@somanshu.jindal<br><h3><u>#random</u></h3><br><strong>@somanshu.jindal:
</strong>@somanshu.jindal has joined the
channel<br><h3><u>#troubleshooting</u></h3><br><strong>@jackie.jxt:
</strong>The ideal state looks correct. What is the query you sent? Is the data
partitioned by a column?<br><strong>@pradeepgv42: </strong>```select
count(distinct(<column_name>)) from <table> ```
column_name % 64 is by which producer decides the kafka partition to send the
data into<br><strong>@jackie.jxt: </strong>Without any filter the query should
hit all segments<br><strong>@jackie.jxt: </strong>Can you paste the external
view of the table?<br><strong>@pradeepgv42: </strong><br><strong>@pradeepgv42:
</strong>Ah this is interesting, there’s some segments in ERROR
state<br><strong>@pradeepgv42: </strong>And I only see 32 segments
here<br><strong>@jackie.jxt: </strong>You might need to open the server log to
see what's going wrong with the ERRORed segments<br><strong>@pradeepgv42:
</strong>yup yup trying that<br><strong>@pradeepgv42: </strong>curious what
does externalview imply?<br><strong>@jackie.jxt: </strong>You can think of
Pinot cluster management as a state machine<br><strong>@jackie.jxt:
</strong>Ideal state is the desired state, external view is the actual
state<br><strong>@jackie.jxt: </strong>What command did you use to get the
ideal state/external view?<br><strong>@pradeepgv42: </strong>swagger
api<br><strong>@pradeepgv42: </strong>on the
controller<br><strong>@pradeepgv42: </strong>ah there’s a `Caused by:
java.lang.OutOfMemoryError: Direct buffer memory`<br><strong>@pradeepgv42:
</strong>I have two servers consuming for this kafka topic and ideal size set
to 150MB and initial segments sizes are ~70MB per partition
So, that leaves me at max (150 * 32) or if we need both the segments while
swapping (150 * 64) ~9.6G
Machine size is 16G and I didn’t change the default setting of the pinot
servers<br><strong>@pradeepgv42: </strong>Any suggestions on how to think about
amount of memory to allocate? or machine size?<br><strong>@g.kishore:
</strong>@npawar can you point him to the provisioning tool<br><strong>@npawar:
</strong>I don't think there's a doc for that, but this blog has all the
details:
<https://u17000708.ct.sendgrid.net/ls/click?upn=1BiFF0-2FtVRazUn1cLzaiMULmwXpUy0vBvQDjipJvea-2B1E47iEO0mvL4H3-2FQjn-2FwdnaGaShQy-2FTaAueBS0jK6Lr5Jzj8AtV42fXa0BudKOGwYhVjpaTjPmO9-2FDBNumbyEFoyqsfU1tUQs9sN-2FV9E3-2FiGwVvP0-2BD5HZ2ZFTuoenI0AERuzLEJKH7MgaAhAPyBkg_eg_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTx5dmck3M6r8v4JDXYyYiCesJQ-2B3qlBjwsGvkfuOSHKJa9bFI1OmzOBMAUwY8dnFqZwpktusOzM4Avt-2Fy0WB-2FTfzfkv2EgG7LJUhgqgHtheJ-2BpDycD-2BvcIdaHdr0YZjmzHxgGEYn9RfAgOqtTwLE4eVjsYt16Cti3O4C7fujxQZqjsYHd-2BK-2F6hNAWE-2Fr60ZRTA-3D><br><strong>@pradeepgv42:
</strong>thanks will go over it<br><strong>@quietgolfer: </strong>I have a
Kubernetes batch job that runs a LaunchDataIngestionJob. If the job fails, the
kubernetes job is still marked as succeeded and completed. This seems like a
bug. I'd expect it to indicate that the job failed.
``` kubectl get pods --namespace $NAMESPACE
NAME READY STATUS
RESTARTS AGE
...
pinot-populate-local-data-hwpdm 0/1 Completed 0
14s```
```kubectl logs --namespace $NAMESPACE pinot-populate-local-data-hwpdm
...
java.lang.RuntimeException: Caught exception during running -
org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
...```
```kubectl describe --namespace $NAMESPACE pod/pinot-populate-local-data-hwpdm
...
Status: Succeeded```<br><strong>@quietgolfer: </strong>```# TODO - is
outputDirURI set correctly?
apiVersion: v1
kind: ConfigMap
metadata:
name: pinot-local-data-config
data:
local_batch_job_spec.yaml: |-
executionFrameworkSpec:
name: 'standalone'
segmentGenerationJobRunnerClassName:
'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
segmentTarPushJobRunnerClassName:
'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
segmentUriPushJobRunnerClassName:
'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
jobType: SegmentCreationAndTarPush
inputDirURI: '/home/pinot/local-raw-data/'
outputDirURI: '/tmp/metrics/segments/'
overwriteOutput: true
pinotFSSpecs:
- scheme: file
className: org.apache.pinot.spi.filesystem.LocalPinotFS
recordReaderSpec:
dataFormat: 'json'
className: 'org.apache.pinot.plugin.inputformat.json.JSONRecordReader'
tableSpec:
tableName: 'metrics'
schemaURI:
'<https://u17000708.ct.sendgrid.net/ls/click?upn=iSrCRfgZvz-2BV64a3Rv7HYatTHeqKcLaMSt8ep4ihF7RXtXGgZxp5hTRHO-2BY-2B6tuCTOWdECu2nvqZ7jax-2FyFgbw-3D-3DbQGJ_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTx5dmck3M6r8v4JDXYyYiCeXSTl4QKLSvcODYEFcFQ2XbO1ZxQGtBNzkmb8oTZu8gyvTsQvytzCcTMuNcz8XoyW57DtD-2B7-2B3IUWHpBABhVI26176s9vuvZlnRF-2Fnme5vzlhS2DSW-2FfHLwUIzEvKYdcUqn7g3qfmZ-2BojnhIpLw83ku9IOFnwtd-2B3cxr1V728kCI-3D>'
tableConfigURI:
'<https://u17000708.ct.sendgrid.net/ls/click?upn=iSrCRfgZvz-2BV64a3Rv7HYatTHeqKcLaMSt8ep4ihF7RXtXGgZxp5hTRHO-2BY-2B6tuCaOm-2FZ-2BCvPv7tFdFPs-2Fbz5A-3D-3DWaKw_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTx5dmck3M6r8v4JDXYyYiCeu4phGPDJc-2F69-2BUz1JueV8543d2bqTd5VNWVeUIgYIqr3tOwdn4Spkza4ShRU-2FiC2TlB0LK7EFVWKDoPFprXdYhx0SVdXdwqtnsv5R-2BQcluyvXolFzqZ3PVRN5M-2FnPoA5yLQQYC7PTtaY44hAgFGxdy0QRMjBICnIa2d8fqVLONI-3D>'
pinotClusterSpecs:
- controllerURI:
'<https://u17000708.ct.sendgrid.net/ls/click?upn=iSrCRfgZvz-2BV64a3Rv7HYatTHeqKcLaMSt8ep4ihF7R7meK-2BGMYOH71C9ZJr7fDkHtzN_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTx5dmck3M6r8v4JDXYyYiCeXqkPGQ6TVVkkqHsyARtcxElaxvYabOXaNHDjm4W7Li5fv8GXpTdMs6cRTCoAakW9Q8eV4giBrhd-2BnTK4Ttp4-2FafdjnsiZMIrjyv1wuITX7lAzLxte-2F1T-2FCCM1jACgicO-2FcmJX6-2F8XWQsziQ-2BMqF53cVXI2PfwFqMpqGat0BbgZM-3D>'
---
apiVersion: batch/v1
kind: Job
metadata:
name: pinot-populate-local-data
spec:
template:
spec:
containers:
- name: pinot-populate-local-data
image: apachepinot/pinot:0.4.0
args: [ "LaunchDataIngestionJob", "-jobSpecFile",
"/home/pinot/pinot-config/local_batch_job_spec.yaml" ]
volumeMounts:
- name: pinot-local-data-config
mountPath: /home/pinot/pinot-config
- name: pinot-local-data
mountPath: /home/pinot/local-raw-data
restartPolicy: OnFailure
volumes:
- name: pinot-local-data-config
configMap:
name: pinot-local-data-config
- name: pinot-local-data
hostPath:
path: /my/local/path
backoffLimit: 100```<br><strong>@quietgolfer: </strong>This isn't blocking me
but I'd imagine this would lead to quality bugs in
production.<br><strong>@fx19880617: </strong>I will take a look, it would be
helpful if you can paste the stacktrace or create an
issue<br><strong>@fx19880617: </strong>so I can check why the job is not
failing<br>