[ 
https://issues.apache.org/jira/browse/SPARK-37122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biswa Singh updated SPARK-37122:
--------------------------------
    Description: 
This issue is similar to 
https://issues.apache.org/jira/browse/SPARK-35237?focusedCommentId=17340723&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17340723.
 We receive the Following warning:

 

21:00:26.277 [rpc-server-4-2] WARN  o.a.s.n.s.TransportChannelHandler - 
Exception in connection from 
/10.198.3.179:51184java.lang.IllegalArgumentException: Too large frame: 
5135603447297303916 at 
org.sparkproject.guava.base.Preconditions.checkArgument(Preconditions.java:119) 
at 
org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:148)
 at 
org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:98)
 at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
 at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
 at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
 at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
 at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
 at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
 at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
 at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
 at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719) 
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
 at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581) at 
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at 
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
 at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) 
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
 at java.base/java.lang.Thread.run(Unknown Source)

 

Below are other details related to prometheus. Please scroll down to find out 
details of the issue:

 
{noformat}
Prometheus Scrape Configuration
===============================
- job_name: 'kubernetes-pods'
      kubernetes_sd_configs:
        - role: pod
      relabel_configs:
        - action: labelmap
          regex: __meta_kubernetes_pod_label_(.+)
        - source_labels: [__meta_kubernetes_namespace]
          action: replace
          target_label: kubernetes_namespace
        - source_labels: [__meta_kubernetes_pod_name]
          action: replace
          target_label: kubernetes_pod_name
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
          action: keep
          regex: true
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
          action: replace
          target_label: __scheme__
          regex: (https?)
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
          action: replace
          target_label: __metrics_path__
          regex: (.+)
        - source_labels: [__address__, __meta_kubernetes_pod_prometheus_io_port]
          action: replace
          target_label: __address__
          regex: ([^:]+)(?::\d+)?;(\d+)
          replacement: $1:$2

tcptrack command output in spark3 pod
======================================
10.198.22.240:51258  10.198.40.143:7079  CLOSED 10s 0 B/s
10.198.22.240:51258  10.198.40.143:7079  CLOSED 10s 0 B/s
10.198.22.240:50354  10.198.40.143:7079  CLOSED 40s 0 B/s
10.198.22.240:33152  10.198.40.143:4040  ESTABLISHED 2s 0 B/s
10.198.22.240:47726  10.198.40.143:8090  ESTABLISHED 9s 0 B/s

10.198.22.240 = prometheus pod 

ip10.198.40.143 = testpod ip 

Issue
======
Though the scrape config is expected to scrape on port 8090. I see prometheus 
tries to initiate scrape on ports like 7079, 7078, 4040, etc on
the spark3 pod and hence the exception in spark3 pod. But is this really a 
prometheus issue or something at spark side? We don't see any such exception in 
any of the other pods. All our pods including spark3 are annotated with:

annotations:
   prometheus.io/port: "8090"
   prometheus.io/scrape: "true"

We get the metrics and everything fine just extra warning for this 
exception.{noformat}
 

  was:
This issue is similar to 
https://issues.apache.org/jira/browse/SPARK-35237?focusedCommentId=17340723&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17340723.
 We receive the Following warning:

 

 

21:00:26.277 [rpc-server-4-2] WARN  o.a.s.n.s.TransportChannelHandler - 
Exception in connection from 
/10.198.3.179:51184java.lang.IllegalArgumentException: Too large frame: 
5135603447297303916 at 
org.sparkproject.guava.base.Preconditions.checkArgument(Preconditions.java:119) 
at 
org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:148)
 at 
org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:98)
 at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
 at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
 at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
 at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
 at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
 at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
 at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
 at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
 at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719) 
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
 at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581) at 
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at 
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
 at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) 
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
 at java.base/java.lang.Thread.run(Unknown Source)

 

Below are other details related to prometheus.

 
{noformat}

Prometheus Scrape Configuration
===============================
- job_name: 'kubernetes-pods'
      kubernetes_sd_configs:
        - role: pod
      relabel_configs:
        - action: labelmap
          regex: __meta_kubernetes_pod_label_(.+)
        - source_labels: [__meta_kubernetes_namespace]
          action: replace
          target_label: kubernetes_namespace
        - source_labels: [__meta_kubernetes_pod_name]
          action: replace
          target_label: kubernetes_pod_name
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
          action: keep
          regex: true
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
          action: replace
          target_label: __scheme__
          regex: (https?)
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
          action: replace
          target_label: __metrics_path__
          regex: (.+)
        - source_labels: [__address__, __meta_kubernetes_pod_prometheus_io_port]
          action: replace
          target_label: __address__
          regex: ([^:]+)(?::\d+)?;(\d+)
          replacement: $1:$2

tcptrack command output in spark3 pod
======================================
10.198.22.240:51258  10.198.40.143:7079  CLOSED 10s 0 B/s
10.198.22.240:51258  10.198.40.143:7079  CLOSED 10s 0 B/s
10.198.22.240:50354  10.198.40.143:7079  CLOSED 40s 0 B/s
10.198.22.240:33152  10.198.40.143:4040  ESTABLISHED 2s 0 B/s
10.198.22.240:47726  10.198.40.143:8090  ESTABLISHED 9s 0 B/s

10.198.22.240 = prometheus pod 

ip10.198.40.143 = testpod ip 

Issue
======
Though the scrape config is expected to scrape on port 8090. I see prometheus 
tries to initiate scrape on ports like 7079, 7078, 4040, etc on
the spark3 pod and hence the exception in spark3 pod. But is this really a 
prometheus issue or something at spark side? We don't see any such exception in 
any of the other pods. All our pods including spark3 are annotated with:

annotations:
   prometheus.io/port: "8090"
   prometheus.io/scrape: "true"

We get the metrics and everything fine just extra warning for this 
exception.{noformat}
 


> java.lang.IllegalArgumentException Related to Prometheus
> --------------------------------------------------------
>
>                 Key: SPARK-37122
>                 URL: https://issues.apache.org/jira/browse/SPARK-37122
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes
>    Affects Versions: 3.0.2, 3.1.1
>            Reporter: Biswa Singh
>            Priority: Critical
>
> This issue is similar to 
> https://issues.apache.org/jira/browse/SPARK-35237?focusedCommentId=17340723&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17340723.
>  We receive the Following warning:
>  
> 21:00:26.277 [rpc-server-4-2] WARN  o.a.s.n.s.TransportChannelHandler - 
> Exception in connection from 
> /10.198.3.179:51184java.lang.IllegalArgumentException: Too large frame: 
> 5135603447297303916 at 
> org.sparkproject.guava.base.Preconditions.checkArgument(Preconditions.java:119)
>  at 
> org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:148)
>  at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:98)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>  at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>  at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
>  at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719) 
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581) 
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
>  at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) 
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  at java.base/java.lang.Thread.run(Unknown Source)
>  
> Below are other details related to prometheus. Please scroll down to find out 
> details of the issue:
>  
> {noformat}
> Prometheus Scrape Configuration
> ===============================
> - job_name: 'kubernetes-pods'
>       kubernetes_sd_configs:
>         - role: pod
>       relabel_configs:
>         - action: labelmap
>           regex: __meta_kubernetes_pod_label_(.+)
>         - source_labels: [__meta_kubernetes_namespace]
>           action: replace
>           target_label: kubernetes_namespace
>         - source_labels: [__meta_kubernetes_pod_name]
>           action: replace
>           target_label: kubernetes_pod_name
>         - source_labels: 
> [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
>           action: keep
>           regex: true
>         - source_labels: 
> [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
>           action: replace
>           target_label: __scheme__
>           regex: (https?)
>         - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
>           action: replace
>           target_label: __metrics_path__
>           regex: (.+)
>         - source_labels: [__address__, 
> __meta_kubernetes_pod_prometheus_io_port]
>           action: replace
>           target_label: __address__
>           regex: ([^:]+)(?::\d+)?;(\d+)
>           replacement: $1:$2
> tcptrack command output in spark3 pod
> ======================================
> 10.198.22.240:51258  10.198.40.143:7079  CLOSED 10s 0 B/s
> 10.198.22.240:51258  10.198.40.143:7079  CLOSED 10s 0 B/s
> 10.198.22.240:50354  10.198.40.143:7079  CLOSED 40s 0 B/s
> 10.198.22.240:33152  10.198.40.143:4040  ESTABLISHED 2s 0 B/s
> 10.198.22.240:47726  10.198.40.143:8090  ESTABLISHED 9s 0 B/s
> 10.198.22.240 = prometheus pod 
> ip10.198.40.143 = testpod ip 
> Issue
> ======
> Though the scrape config is expected to scrape on port 8090. I see prometheus 
> tries to initiate scrape on ports like 7079, 7078, 4040, etc on
> the spark3 pod and hence the exception in spark3 pod. But is this really a 
> prometheus issue or something at spark side? We don't see any such exception 
> in any of the other pods. All our pods including spark3 are annotated with:
> annotations:
>    prometheus.io/port: "8090"
>    prometheus.io/scrape: "true"
> We get the metrics and everything fine just extra warning for this 
> exception.{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to