Apache Pinot Daily Email Digest (2020-06-26)

Pinot Slack Email Digest Fri, 26 Jun 2020 19:00:34 -0700

<h3><u>#general</u></h3><br><strong>@yupeng: </strong>hi team, I’d like to hear 
your thoughts on this issue 
<https://u17000708.ct.sendgrid.net/ls/click?upn=1BiFF0-2FtVRazUn1cLzaiMSfW2QiSG4bkQpnpkSL7FiK3MHb8libOHmhAW89nP5XKI1pSQGgFuD-2BXBsUHb4ZzZg-3D-3DoJEQ_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTzxuE24xTazj5IF8jydOi3B-2BdbJRgm52iUl5-2BOEKMeImBnI9seGQ0e7ikAb1-2BXiLVmP16ZmI8-2BRAuxE-2FsFy3uyqK4XUN6Afz9tpiagKKnnznHqcpzNYB7-2FZuyNc9zWI3x-2BEA-2BS90E8d8uonDadZpqW7OrCDMB6JbWzNqF4RaHygo-2B-2Ffi9p1bzFcGSG0pJenngk-3D><br><h3><u>#troubleshooting</u></h3><br><strong>@quietgolfer:
 </strong>Hi.  I'm getting the following error related to my realtime table 
ingesting events from kafka.  `insertions` is just a long metric column.  It's 
not a dimension.  I'm not sure why it's being hitting this.


```Metrics aggregation cannot be turned ON in presence of dictionary encoded 
metrics, eg: insertions```
<br><strong>@steotia: </strong>For realtime metrics aggregation, all metric 
columns should be noDictionary<br><strong>@mayanks: </strong>This 
^^<br><strong>@mayanks: </strong>Also dimension columns cannot be no 
dict<br><strong>@steotia: </strong>we should probably consider adding support 
for realtime metrics aggregation with dictionary encoded metric columns and 
remove this restriction unless it is infeasible<br><strong>@g.kishore: 
</strong>we can but it inefficient<br><strong>@quietgolfer: </strong>I don't 
think these metrics need to be dictionary encoded.  I assumed metric columns by 
default would not be dictionary encoded.<br><strong>@steotia: </strong>all 
columns by default are dictionary encoded AFAIK<br><strong>@quietgolfer: 
</strong>Looks like the docs say that.  I'll update my config.  
Thanks!<br><strong>@quietgolfer: </strong>Separate topic, my `pinot-server-0` 
log is showing that  my params in `streamConfig`
were `supplied but isn't a known config` .  Any ideas?

```2020/06/26 17:26:05.729 WARN [ConsumerConfig] 
[HelixTaskExecutor-message_handle_thread] The configuration 
'stream.kafka.hlc.zk.connect.string' was supplied but isn't a known config.
2020/06/26 17:26:05.729 WARN [ConsumerConfig] 
[HelixTaskExecutor-message_handle_thread] The configuration 
'realtime.segment.flush.threshold.size' was supplied but isn't a known config.
2020/06/26 17:26:05.737 WARN [ConsumerConfig] 
[HelixTaskExecutor-message_handle_thread] The configuration 
'stream.kafka.decoder.class.name' was supplied but isn't a known config.
2020/06/26 17:26:05.737 WARN [ConsumerConfig] 
[HelixTaskExecutor-message_handle_thread] The configuration 'streamType' was 
supplied but isn't a known config.
...```
Here is what my table config looks like (with some repetitive parts removed)
```
{
  "tableName": "metrics",
  "tableType": "REALTIME",
  "segmentsConfig": {
    "timeColumnName": "timestamp",
    "timeType": "MILLISECONDS",
    "retentionTimeUnit": "DAYS",
    "retentionTimeValue": "5",
    "segmentPushType": "APPEND",
    "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy",
    "schemaName": "metrics",
    "replication": "1",
    "replicasPerPartition": "1" 
  },  
  "tableIndexConfig": {
    "loadMode"  : "MMAP",
    "starTreeIndexConfigs": [
      {   
        "dimensionsSplitOrder": [
          "platform_id",
          ... my other dimensions
        ],  
        "skipStarNodeCreationForDimensions": [
        ],  
        "functionColumnPairs": [
          "SUM__insertions",
          ... my other metrics
        ]   
      }   
    ],  
    "streamConfigs": {
      "streamType": "kafka",
      "stream.kafka.consumer.type": "simple",
      "stream.kafka.topic.name": "metrics-realtime",
      ... all of the configs are ignored
    }   
  },  
  "tenants": {}, 
  "metadata": {
    "customConfigs": {}
  }   
}   ```<br><strong>@npawar: </strong>those are coming from the kafka consumer. 
because we put the entire streamConfigs map from pinot into the consumer 
configs. those should be harmless.<br><strong>@g.kishore: </strong>a good 
beginner task to fix<br><strong>@quietgolfer: </strong>Cool.  I was hitting an 
issue where my Pinot server temporarily stopped ingesting events from Kafka and 
was trying to figure out if there was an issue.  It's working again and still 
logging those lines.<br><strong>@pradeepgv42: </strong>Hi, I am having trouble 
querying the recent data from Pinot, `select * from &lt;tablename&gt; order by 
timestamp limit 10`  returns empty.
I also see some of the old segments to be in error state from the broker logs
```
Resource: &lt;tablename&gt;_REALTIME, partition: 
&lt;tablename&gt;__7__0__20200625T1913Z is in ERROR state
Resource: &lt;tablename&gt;_REALTIME, partition: 
&lt;tablename&gt;__8__0__20200625T1913Z is in ERROR state
Resource: &lt;tablename&gt;_REALTIME, partition: 
&lt;tablename&gt;__9__0__20200625T1913Z is in ERROR state ```
<br><strong>@pradeepgv42: </strong>Wondering how do I go ahead about debugging 
this?<br><strong>@mayanks: </strong>The server log should show you what the 
segment went into error state.<br><strong>@pradeepgv42: </strong>Above is from 
server logs, for example: `&lt;tablename&gt;__9__0__20200625T1913Z`  is one of 
the segment that went into error state (though it’s an older 
segment)<br><strong>@pradeepgv42: </strong>I don’t see any logs on recent 
segments<br><strong>@mayanks: </strong>Can you grep for one of the segments in 
error state in all the server logs (if you have separate logs per day)? It 
should show some error/exeception that caused this to 
happen.<br><strong>@mayanks: </strong>Alternatively, you can try to re-start 
the server and see if the segment still gets into ERROR 
state.<br><strong>@pradeepgv42: </strong>tried restarting, seems like a helix 
issue let me dig a bit more<br><strong>@mayanks: </strong>Are these old 
segments still within your retention period?<br><strong>@pradeepgv42: 
</strong>yeah these are within ttl<br><strong>@pradeepgv42: </strong>for exmple 
one of the exceptions I see in the server logs
```
Sleep for 10000ms as service status has not turned GOOD: 
IdealStateAndCurrentStateMatchServiceStatusCallback:partition=&lt;tablename&gt;__57__0__20200625T191
3Z, expected=ONLINE, found=OFFLINE, creationTime=1593202611703, 
modifiedTime=1593202650032, version=95, waitingFor=CurrentStateMatch, 
resource=&lt;tablename&gt;_REALTIME, numResourcesLeft=1, numTotalResources=1, 
minStartCount=1,;IdealStateAndExternalViewMatchServiceStatusCallback:Init;
Caught exception in state transition from OFFLINE -&gt; ONLINE for resource: 
&lt;tablename&gt;_REALTIME, partition: &lt;tablename&gt;__57__0__20200625T1913Z
java.lang.RuntimeException: 
org.apache.pinot.spi.utils.retry.AttemptsExceededException: Operation failed 
after 3 attempts
        at 
org.apache.pinot.core.data.manager.realtime.RealtimeTableDataManager.downloadAndReplaceSegment(RealtimeTableDataManager.java:286)
 ~[pinot-all-0
.4.0-jar-with-dependencies.jar:0.4.0-8355d2e0e489a8d127f2e32793671fba505628a8]
        at 
org.apache.pinot.core.data.manager.realtime.RealtimeTableDataManager.addSegment(RealtimeTableDataManager.java:252)
 ~[pinot-all-0.4.0-jar-with-d
ependencies.jar:0.4.0-8355d2e0e489a8d127f2e32793671fba505628a8]
        at 
org.apache.pinot.server.starter.helix.HelixInstanceDataManager.addRealtimeSegment(HelixInstanceDataManager.java:132)
 ~[pinot-all-0.4.0-jar-with
-dependencies.jar:0.4.0-8355d2e0e489a8d127f2e32793671fba505628a8]
        at 
org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeOnlineFromOffline(SegmentOnlineOfflineStateModelFactory.java:164)
 
[pinot-all-0.4.0-jar-with-dependencies.jar:0.4.0-8355d2e0e489a8d127f2e32793671fba505628a8]
        at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) 
~[?:?]```
<br><strong>@g.kishore: </strong>did you setup deep store for 
Pinot?<br><strong>@g.kishore: </strong>also in the 
controller<br><strong>@pradeepgv42: </strong>yup I did<br><strong>@pradeepgv42: 
</strong>For example 
`<https://u17000708.ct.sendgrid.net/ls/click?upn=iSrCRfgZvz-2BV64a3Rv7HYXS2B8cms5oWW28KNpfKTsh6piBjoaaV6QhCqBmE5sUHhBye58ZrKlaiDRtBPSCCE51v5XZeDfyD1S3hnoqXdIliQR6Y2Wkb5R19TUhRhwsEyU8OLfwrN29kq0O3r-2Bfqsg-3D-3DmbVl_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTzxuE24xTazj5IF8jydOi3B5wgxt0j2oUIvMqFLGxCCaE3faCEtG5UgaGxNZXyMK89C4ka8ViFiCiDJ2KZjVrUBtjQ1OeZ2u4NwK4mAgyLmvogCYie46Ttl3GizNj0ntXzSzLyLuB4a6Dg8nQmuNM-2FTrtE6-2FcZjFg1gVbYJh36XiEEAxGuj-2F5Qh8JrO6NdC7EM-3D>
 /segments/{tableName}/servers` api returns segments distributed across two 
servers (currently i have tagged one as realtime and other as 
offline)<br><strong>@g.kishore: </strong>does it print the segment uri in the 
log?<br><strong>@g.kishore: </strong>also whats the metadata for that segment
```curl -X GET --header 'Accept: application/json' 
'<https://u17000708.ct.sendgrid.net/ls/click?upn=iSrCRfgZvz-2BV64a3Rv7HYV9TJa4jsQAayVdH8o3CuHA-3DufqM_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTzxuE24xTazj5IF8jydOi3BtV43ptDuEpRu0SmGJnECS1162XrXff5o5PXcbx-2FBJUrpMo4gzgq-2BCSVEJKcx3RRoDY3LPLlL1gYtF3J-2BRbHnjrQbqaWyAN0PXieW34-2B1hmnbeY9WQ87f-2BAGrneyXg-2BpixnORagv0oqXr53tIJPc8mnX2MqBxBemqNe8JoO72H58-3D>:port/segments/&lt;tableName&gt;/&lt;segment_name&gt;/metadata'```<br><strong>@pradeepgv42:
 </strong>I don’t actually see any segments downloaded into the offline tagged 
machine, I guess swagger api was just showing me the ideal state.

```
Got temporary error status code: 500 while downloading segment from: 
https://u17000708.ct.sendgrid.net/ls/click?upn=iSrCRfgZvz-2BV64a3Rv7HYfnYZT1b59-2BCtX0F9uq9RLCpM3QBUUVUhd7oXmBCkx6bUPzoO-2BEuohDBm80ug6n8nczMpCuuzrJzIJpk-2FSWGG4laQgi-2FvazT-2BQXkB25D6W2zFP-R_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTzxuE24xTazj5IF8jydOi3BBOv4rL11kf0h5vxlzrq4GWdlRnsD8c5I3eS8f2jEv8SE0juBbavvVseIrYS-2FBZ5e4lMCWP5eA0pxhnZzWGuf3U5-2FLlGrjmu6qc0n5jKVsVuOw4uC-2Fpsh8zv82dUsnaQ2RTt1sr6JQHGCSqag7vXC1700IJbfCp5dFw8UO9TBMOA-3D
3Z to: 
/home/ubuntu/pinot/data/&lt;table&gt;_REALTIME/&lt;table&gt;__60__0__20200625T1913Z.tar.gz
org.apache.pinot.common.exception.HttpErrorStatusException: Got error status 
code: 500 (Internal Server Error) with reason: "Failed to read response into
file: 
/home/ubuntu/data/fileDownloadTemp/&lt;table&gt;/&lt;table&gt;__60__0__20200625T1913Z-1386265827082994"
 while sending request: 
https://u17000708.ct.sendgrid.net/ls/click?upn=iSrCRfgZvz-2BV64a3Rv7HYfnYZT1b59-2BCtX0F9uq9RLCpM3QBUUVUhd7oXmBCkx6bUPzoO-2BEuohDBm80ug6n8nczMpCuuzrJzIJpk-2FSWGG4mNX7bYCzJS-2FSY86ntWpwCHKSKL_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTzxuE24xTazj5IF8jydOi3Bx-2Fs-2B4EsdJ8q9cic48NJoDVOK69SP-2FXQaXQd4Cs7TP9BZJpDgW4CQZX0ebtsxlIqOvLFdtFzkq7GT07oEVuiWypdcrBhYU56bRghoMmdSmJu7HulfFt7Gk8BjEzUnJrZjnoj5xxkGqvq-2Fph07mA8XYeZqd8dED-2BG2gPIyHbG2xW8-3D
 to controller: &lt;ip&gt;, version: Unknown```
<br><strong>@pradeepgv42: </strong>```{
  "segment.realtime.endOffset": "78125",
  "segment.time.unit": "MILLISECONDS",
  "segment.start.time": "1593031658266",
  "segment.flush.threshold.size": "78125",
  "segment.realtime.startOffset": "0",
  "segment.end.time": "1593080340734",
  "segment.total.docs": "78125",
  "segment.table.name": "&lt;table&gt;_REALTIME",
  "segment.realtime.numReplicas": "1",
  "segment.creation.time": "1593112417336",
  "segment.realtime.download.url": 
"https://u17000708.ct.sendgrid.net/ls/click?upn=iSrCRfgZvz-2BV64a3Rv7HYfnYZT1b59-2BCtX0F9uq9RLCpM3QBUUVUhd7oXmBCkx6bUPzoO-2BEuohDBm80ug6n8nbCWJjQacoXVEVgtCZ4nN8HGZXplD0mo-2Fq3hL7XdhISxY5f-_vGLQYiKGfBLXsUt3KGBrxeq6BCTMpPOLROqAvDqBeTzxuE24xTazj5IF8jydOi3B0PR9K5-2FYxCy6Arm3bi4K5yhyUegd8Y-2Bc5vaxOVFihBPZScptmO-2BmWCIri8985ptaxkmUeS0BAX2Ms7BRv7EEadkzED6FOGD0aZSJ6W-2BiCOHkdXK59txRV3GKRoRDJzoPpEsawJwk9YlTCUfPx6WZ41RjyPso5-2FzTqnMwufqFHAM-3D
  "segment.name": "&lt;table&gt;__57__0__20200625T1913Z",
  "segment.index.version": "v3",
  "custom.map": null,
  "segment.flush.threshold.time": null,
  "segment.type": "REALTIME",
  "segment.crc": "4231660115",
  "segment.realtime.status": "DONE"
}```<br><strong>@pradeepgv42: </strong>is the download url supposed to be 
something else?<br><strong>@g.kishore: </strong>it should be a valid url 
pointing to your s3<br><strong>@pradeepgv42: </strong>I do see the segments in 
S3 though<br><strong>@pradeepgv42: </strong>let me double check my 
config<br><strong>@g.kishore: </strong>but that url seems to be pointing to 
controller<br><strong>@pradeepgv42: </strong>yup<br><strong>@g.kishore: 
</strong>and not s3<br><strong>@g.kishore: </strong>that means segments were 
getting uploaded to controller<br><strong>@g.kishore: </strong>which is the 
default<br><strong>@pradeepgv42: </strong>weird, I see the exact segment being 
present in S3<br><strong>@pradeepgv42: </strong>and some of the latest segments 
are getting uploaded into S3 too<br><strong>@pradeepgv42: </strong>I found this 
in the controller logs
```
Could not get directory entry for 
s3://&lt;bucket&gt;/&lt;dir&gt;/&lt;table&gt;/&lt;table&gt;__57__0__20200625T1913Z
Copy 
/home/ubuntu/data/fileUploadTemp/&lt;table&gt;__57__0__20200625T1913Z.0943bed4-49e7-4bec-999b-35e9802b3d73
 from local to 
s3://&lt;bucket&gt;/&lt;dir&gt;/&lt;table&gt;/&lt;table&gt;__57__0__20200625T1913Z
Processing segmentCommitEnd(Server_&lt;ip&gt;_8098, 78125)
Committing segment &lt;table&gt;__57__0__20200625T1913Z at offset 78125 winner 
Server_&lt;ip&gt;_8098
Committing segment metadata for segment: &lt;table&gt;__57__0__20200625T1913Z```
<br><strong>@pradeepgv42: </strong>Okay, I think I got a rough picture of 
what’s going on, on the REALTIME server,
helix state for the segment was changed from ONLINE to OFFLINE and then to 
DROPPED
and parallely on the OFFLINE server segment state was changed from OFFLINE to 
ONLINE

REALTIME server part seemed to have executed through but the OFFLINE part is 
stuck because of the S3 issue
1. Still not sure why segment url is set to `controller ip` 
2. IIUC Does having 1 replica imply, some segments might be not available for 
querying when they are moving across servers? <br>

Apache Pinot Daily Email Digest (2020-06-26)

Reply via email to